-  r  £ 

ul,  ,  V*' 


IMAGE  UNDERSTANDING  WORKSHOP 


^  APRIL  1980 


^  Sponsored  by: 

c  Information  Processing  Techniques  Office 
Q  Defense  Advanced  Research  Projects  Agency 


Hosted  by. 

University  of  Maryland 
Computer  Vision  Laboratory 


STEPS  IN  THE  RECOGNITION 
OF  CULTURAL  FEATURES 


Science  Applications,  Inc 


IMAGE  UNDERSTANDING 


Proceedings  of  a  Workshop 0''!) 

Held  at 

College  J>ark,  Maryland* 

^  April  30, 1980 , 


Sponsored  by  the 
Defense  Advanced  Research  Projects  Agency 


NoV  1 1'd  f  H 


/  A  1  *  A 

iJjj  jLJ* 


|d*)  MDA  '  J 

'en- 


1“ ) 

Science  Applications*  InQ. 
Report  Numbe^-SM-ShJJi^KA-L 
//m  Lee  S /Baumann 
Wo?fe6nop  Organizer  and 
_  proceedings  Editor 

~Y&~?  \  1  A- t 

This  report  was  supported  by 
The  Defense  Advanced  Research 
Projects  Agency  under  DAI£A_ 

Order  No.  3456,  Cm.tr.ct  No. 

Defense  Supply  Service,  Washington,  D.C. 


approved  FOR  public  release 
distribution  unlimited 

J  United  Stntee  Government. 


A 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAG!  fUTiAn  Pit  Enuttd/ 

I  REPORT  DOCUMENTATION  PAGE 


M  REPORT  NUMBER 


SAI-81-170-WA 

*.  TITLE  (and  Submit) 


2  GOVT  ACCESSION  NO 


IMAGE  UNDERSTANDING 

Proceedings  of  a  Workshop,  April,  1980 


READ  INSTRUCTIONS 

_ BEFORE  COMPLETING  FORM 

3  RECIPIENT'S  CATAi.OG  number 

Jk _ _ _ 

s  Type  of  report  a  period  covered 
SEMI  ANNUAL  TECHNICAL 
NOVEMBER,  1979-APRIL,  198G 

6  PERFORMING  ORC.  REPORT  NUMBF  M 


p.  AUTHORffJ 


8  CONTRACT  OR  GRANT  NUMBER^ 


LEE  S.  BAUMANN  (Ed.) 

»  PERFORMING  ORGANIZATION  NAME  ANO  AOORESS 

I  SCIENCE  APPLICATIONS,  INC. 

1911  North  Fort  Myer  Drive, Suite  1200 
Arlington,  Virginia  22209 _ __ _ 

11.  CONTROLLING  OFFICE  NAME  AND  AOORESS 

Defense  Advanced  Research  Projects  Agency 
1400  Wilson  Boulevard 
Arlington,  Virginia  22209 

Ti  MONITORING  AGENCY  NAME  A  ADDRESSril  dllltr.nl  horn  Cor, /rolling  Clltc*) 


MDA903-80-C-0188 

10  PROGRAM  ELEMENT  PROJECT  TASK 
AREA  8  WORK  UN>T  NUMBERS 

ARPA  ORDER  No.  3456 


12.  REPORT  DATE 


13  NUMBER  OF  PAGES 


IS  SECURITY  CLASS  (ot  thin  report) 

UNCLASSIFIED 


is*,  DECLASS  FICATION  DOWNGRADING 
SCHEDULE 


I  Ifi.  DISTRIBUTION  STATEMENT  (ot  thla  Repot,) 


APPROVED  FOR  RELEASE;  DISTRIBUTION  UNLIMITED 


i  I**  DISTRIBUTION  STATEMENT  (ol  the  e batraci  entered  In  Block  20,  tl  dltterent  from  Report) 


IS.  SUPPLEMENTARY  NOTES 


18.  KEY  WORDS  f  Confinin'  on  reverae  eirle  it  neceaaery  end  Identify  by  block  number) 

Digital  Image  Processing;  Image  Understanding;  Scene  Analysis;  Edge 
Detection;  Image  Segmentation;  CCDArrays;  CCD  Processors 


ABSTRACT  (ConNnu#  on  revet  me  elde  It  neceeeery  end  Identity  by  block  number) 


This  document  contains  the  technical  papers  and  outlines  of  semi-annual 
progress  reports  presented  by  the  research  activities  in  Image  Under¬ 
standing  sponsored  by  the  Information  Processing  Techniques  Office; 
Defense  Advanced  Research  Projects  Agency.  The  papers  were  presented  at 
a  workshop  conducted  on  30  April  1980  in  College  Park,  Maryland. 


DD  I  ja*M71  1473  EDITION  OF  I  NOV  .5  IS  OBSOLETE 


_ UNCLASSIFIED _ 

* E CURITV  CLASSIFICATION  OF  THIS  PAGE  flWi.n  Daii  Enfr,d; 


TABLE  OF  CONTENTS 


FORWARD  . 


PAGE 

i 


AUTHOR  INDEX 


iii 


SESSION  I  -  PROGRAM  REVIEWS  BY  PRINCIPAL  INVESTIGATORS 


’’Image  Jnder3tanding  Using  Overlays" 

A.  Rosenfeld ;  University  of  Maryland  . 

"Image  Understanding  Research  at  CMU:  A  Progress  Report” 

J. R.  Render  and  R.  Reddy;  Carnegie-Mellon  University  . 

"Understanding  Images  at  MIT:  Representative  Progress" 

K.  Stevens,  K.  Nishihara,  B.  Sehunck,  the  Staff;  Massachusetts  Institute  of  Technology  .  . 

"Progress  at  the  Rochester  Image  Understanding  Project" 

j . A.  Feldman,  K.R.  Sloan,  Jr.;  The  University  of  Rochester  . 

"Spatial  Understanding" 

T.O.  Binford;  Stanford  University . - . 

"The  SRI  Image  Understanding  Program" 

M. A.  Fischler ;  SRI  International  . 

"Progress  in  Image  Understanding  Research  at  USC 

R.  Nevatia,  A. A.  Sawchuk;  University  of  Southern  California . 


SESSION  II  -  TECHNICAL  PAPERS 


"Toward  the  Recognition  of  Cultural  Features" 

M.  Tavakoli;  University  of  Maryland . .  •  • 

"Atmospheric  Modelling  for  the  Generation  of  Albedr  Images" 

R.W.  Sjoberg,  B.K.P.  Horn;  MIT  Artificial  Intell4  =nce  Laboratory . 

"Random  Sample  Consensus:  A  Paradigm  for  Model  Fitting  with  Applications  to  Image 
Analysis  and  Automated  Cartography" 

M.A.  Fischler,  R.C.  Bolles;  SRI  International . 

"Semantic  Description  of  Aerial  Images  Using  Stochastic  Labeling" 

O.D.  Faugeras,  K.E.  Price;  University  of  Southern  California  . 

"Representing  and  Reasoning  about  Partially  Specified  Scenes’ 

R.A.  Brooks,  T.O.  Binford;  Stanford  University  . 


1 

13 


15 


20 

24 

28 

31 


33 

58 


71 

89 

95 


a 


table  of  CONTENTS  (CONT.) 


aSiBLn^jEciwcAL  Kras  , 


ACC,,,  „  ur„  fWtl-0ta.n,lon>1 

».G.  f~"  1«W  Descriptions” ' 

"lE"P‘-"““»^"“mc,.r;„T.mB““"  *■.’.«■ . 

MPvn  .  intelligence  Laboratory 

Experience  with  r-  y . .  . 

sioan> o.H.tu^r;e^veTrnsfonn" 

..  ine  university  of 

Thp  r,  .  y  ur  Rochester 

ine  Gaussian  Sphere-  a  n  *  *  . 

"•  Kender’'  of  Surface  Orientation" 

D-B.  Gennery^stanford^OniversTty.  Stereo  vision" 

^'Edge-Based  Stereo  Correia  ion„  . 

Ur’  Stanford  University 

; . 

nvestigation  of  VLSI  Tr^i.  .  . 

•L.  Eversole,  D.J.  Mayer;  Te^fi®  f°r  Proces sing" 

"Applicatlon  of  LSI  „  ^  Incorpora £ . 

S,D‘  F0US6’  G-R‘  Architectures" 

"A  'Non-Correlation'  A  Laboratories  . 

^  -scbein;  SSSS^SS^ 

Bootstrap  Stereo"  . 

.,s"  HannaH;  L°Ckheed  — - -  Laboratory . 

G-R.  Allen;  Control^  ^0^01 I'"Ces80r" 


FORWARD 


A  hv  t-he  Defense  Advanced  Research  Projects 

The  Eleventh  Image  Understanding  Workshop Park,  Marvland  on  April 

Agency  (DARPA) ,  Information  Processing  Techniques  q{  DARpA  opened  the  streamlined  one  day 

as 

sm  -r;r 

receiving  copxes  of  these  proce  thls  workshop  was  limited  to 

In  response  to  lower  travel  b^gets  the  forma  P«g  of  continued  austerity  in 

on.  full  any  rattier  than  the  t«o  W»*  J  "JJ  J,  hM  .nnoolly  J0.M1  ol  ,"*12SS«il  »  It 

Tf \wo  dav  sessions  conducted  each  spring. 

This  workshop  was  hosted  ^^'SSnSrSIJi.^exprSS*  hlsPa5precl^on^0S!°tosenfeld 

for  hit  SiSSlS  assistance^  in  Leaning  for  ^W^workshop^and  forMj  'J£>rts  ^^ith  of 

and  organizing  the  program.  Appreciation  i  1  for  putting  together  these  proceed  ngs 

Science  Applications,  Incorporated  for  their 

under  difficult  deadlines.  Rosenfeld  as  representative 

The  materials  for  the  cover  of  this  ^ocumentwere  ^  Dr  Rosenfteid,  is  an  image  of 
of  some  of  the  work  underway  in  his  laboratory.  The  pictu  dj  ^  are  respectively :  line  segments 

on  these^ pic tures^^Mn tained6 i^the^included  paper ^entitled^"Toward^the^Recognition^of  Blomberg 

Features,"  by  Dr.  M.  Tavakoli.  Incorporated. 

of  the  Art  Department  of  Science  Applica  .ions,  .  ___ - 


Lee  S.  Bautaann 

Science  Applications,  Inc. 

Workshop  Organizer 


Accession  For 


OTIS  SKwA&I 
DOC  1AB 
Unr^anounced 
Juatific.  tion 


a 


H 

□ 


By 


_Dlst  .r  i  v  n_+_  1  c  n/ 


Ay/} .i1  ability.  Cedes 

Avail  and/or 
special 


Dlst 


AUTHOR  INDEX 


NAME 

G. R.  Allen 

H.  Baker 

D.H.  Ballard 
T.O.  Binford 
K.C.  Bolles 
R«A.  Brooks 
W.L.  Eversole 
°*D.  Faugeras 
•J.A,  Feldman 
G*  Firscheln 
M.A.  Fischler 
S,D.  Fouse 
U.B.  Gennery 
W.E.L.  Grimson 
M-J.  Hannah 

A,  Helland 

B. K.P.  Horn 

J.R.  Render 


PAGE 

209 

168 

150 

24,  95 

71 

95 

182 

89 

20 

195 

28,  71 

190 

161 

128 

201 

176 

58 

13,  157 


AUTHOR  INDEX  (Cont.) 


NAME 

D . G .  Lowe 

D.J.  Mayer 

R.  Nevatia 

K.  N.lshihara 

G.R.  Nudd 

K. E.  Price 

L. H.  Quam 

R.  Reddy 

A.  Rosenfeld 

A.  A.  Sawchuk 

B.  Schunck 

R.V/.  Sjoberg 

K.R.  Sloan,  Jr. 

K.  Stevens 

M.  Tavakoli 

V.S.  kong 


PAGE 

121 

182 

31 

15 

190 

8^ 

104 

13 

1,  112 

31 

15 

58 

20,  150 

15 

33 

190 


1 


IMAGE  UNDERSTANDING  USING  OVERLAYS 
PROJECT  STATUS  REPORT,  1  OCTOBER  1979-30  MARCH  1980 
CONTRACT  DAAG-53-76C-0138  (DARP*  ORDER  3206) 


Azriel  Rosenfeld 
Principal  Investigator 


Computer  Vision  Laboratory 


Computer  Science  Center 
College  Park 

ABSTRACT 

Current  activities  on  the  project  ire  reviewed 
under  the  following  headings: 

1.  Segmentation  and  texture  analysis 

2.  local  and  global  shape  analysis 

3.  Hierarchical  representation 


INTRODUCTION 

This  project  is  concerned  with  the  study  of 
advanced  teenniques  for  the  analysis  of  reconnais¬ 
sance  imagery.  It  is  being  conducted  under  Con¬ 
tract  DAAG-53-76C-0138  (DARPA  Order  3206),  monitored 
by  the  U.S.  Army  Night  Vision  Laboratory,  Ft.  Bel- 
voir,  VA  (Dr.  George  Jones).  The  Westinghouse 
Systems  Development  Division,  as  a  subcontractor, 
is  investigating  implementation  of  the  techniques 
being  developed  by  Maryland,  particularly  in  the 
area  of  relaxation;  their  efforts  are  reviewed  in 
separate  quarterly  reports. 

The  current  phase  of  the  project  is  concerned 
with  the  development  and  application  of  advanced 
techniques  for  image  processing,  feature  detection, 
segmentation,  texture  and  shape  analysis,  and  re¬ 
gion  representation.  These  aspects  are  reviewed  in 
the  following  sections.  This  report  deals  primarily 
with  the  work  done  during  the  past  six  months;  ac¬ 
tivities  during  earlier  periods  were  reviewed  in 
previous  reports  [1-6].  Some  of  the  topics  are 
discussed  only  briefly,  since  they  are  treated  in 
greater  detail  in  individual  technical  reports  and 
Image  Understanding  Workshop  papers. 


SEGMENTATION  AND  TEXTURE  ANALYSIS 

2 . 1  Edge  detection 

Edges  are  generally  detected  by  thresholding 
the  output  of  some  type  of  difference  operator;  but 
the  choice  of  a  threshold  for  this  purpose  is  not 
easy,  since  the  histogram  of  difference  values 
tends  to  fall  off  smoothly  from  a  peak  near  zero. 
Threshold  selection  becomes  easier  if  we  suppress 
nonmaximum  difference  values  (in  the  gradient  di¬ 
rection)  before  histogramming .  As  Figure  1  shows, 
this  yields  a  histogram  composed  of  a  sharpened 
peak  near  zero  together  with  small  sets  of  higher 
values;  the  latter  are  likely  to  be  good  choices 


University  of  Maryland 
MD  20742 

for  edge  points  [7 ]  . 

Difference  operators  for  edge  detection  can  be 
designed  by  fitting  a  polynomial  surface  to  the 
gray  levels  in  the  neighborhood  of  a  point,  and  tak¬ 
ing  the  gradient  of  that  surface  as  an  estimate  of 
the  image  gradient  This  approach  can  be  genera¬ 
lized  to  the  design  of  operators  for  surface  detec¬ 
tion  in  three-  (or  higher-)  dimensional  arrays  of 
data,  such  as  these  obtained  by  reconstructing  ob¬ 
jects  from  x-ray  projections,  by  stacking  cross- 
sections,  or  by  stacking  successive  frames  in  a 
time  sequence  of  images.  Surface  detection  pro¬ 
vides  results  that  are  more  accurate  and  more  reli¬ 
able  than  those  obtained  by  applying  two-dimen 
sional  edge  detectors  to  the  individual  slices,  as 
can  be  seen  from  Figure  2  [8].  This  work  is  part 
of  a  Ph.D.  thesis  on  processing  and  segmentation 
of  three-dimensional  arrays. 

2.2  Pixel  classification  and  j:exture  analysis 

During  the  past  reporting  period,  an  M.S. 
thesis  was  completed  [9]  on  a  general-purpose  soft¬ 
ware  package  for  performing  relaxation  operations 
on  arrays  of  pixels.  This  package  allows  the  user 
to  specify  the  process  for  computing  initial  proba¬ 
bilities,  the  neighborhood  to  be  used,  and  the  pro¬ 
bability  adjustment  algorithm  (including  the  compa¬ 
tibility  coefficients).  As  an  application,  a 
light/dark  relaxation  process  was  implemented; 
examples  of  this  process  can  be  found  in  earlier 
status  reports  [2,5].  Some  analytical  results  re¬ 
garding  such  two-label  relaxation  processes  will  be 
presented  in  a  forthcoming  quarterly  report  on  the 
Westinghouse  subcontract. 

Relaxation  has  been  successfully  used  to  im¬ 
prove  pixel  classification  based  on  color,  as 
reported  elsewhere.  It  can  similarly  be  used  to 
improve  pixel  classification  in  single-band  images 
based  on  local  property  values  such  as  gray  level 
and  ’’busyness".  Figure  3a  shows  a  house  picture 
containing  five  principal  types  of  regions  sky, 
grass,  bushes,  brick,  and  shadow.  The  bush  and 
shadow  classes  are  very  difficult  to  distinguish; 
they  have  similar  mean  vectors,  and  the  bush  class 
is  more  variable,  so  that  a  maximum-likelihood 
classification  (based  on  Gaussian  fitting  to  the 
clusters  defined  by  hand  segmentation)  misclassi- 
fies  most  of  the  shadow  pixels  as  bush  (Figure  3b) . 
The  results  are  greatly  improved  when  relaxation 
is  used  to  adjust  the  initial  class  probabilities 
for  each  pixe1.  based  on  those  of  its  neighbors; 
see  Figure  3c  Similar  improvement  is  obtained 


when  the  busyness  values  are  iteratively  smoothed, 
e.g.  by  median  filtering,  prior  to  clustering  and 
classification  (Figure  3d).  Further  details  on 
these  experiments  can  be  found  in  [10]. 

Iterative  smoothing  can  also  be  U3ed  tn  im¬ 
prove  the  results  of  texture  classification  using 
texture  features  derived  from  small  windows,  as 
described  in  [11-12], 

2 • 3  Interactive  segm en t at  ion 

An  interactive  image  segmentation  system  is 
being  designed  as  a  contribution  to  the  DARPA/DMA 
Testbed.  The  system  allows  the  user  to  designate 
samples  of  two  classes  (e.g.,  objects  and  back¬ 
ground).  It  analyzes  the  samples,  designs  a  clas- * 
sifier  to  discriminate  them,  and  displavs  the  clas¬ 
sification  results  to  the  user  for  evaluation;  if 
errors  are  designated,  the  system  attempts  to 
modify  the  classifier  so  as  to  eliminate  them.  The 
user  need  not  know  anything  about  the  classifica¬ 
tion  process  or  the  features  that  are  used  for 
classification;  the  system  selects  them  from  a  pre¬ 
specified  repertoire.  The  current,  pilot  version 
of  the  system  classifies  pixels  based  on  gray  level 
only;  future  versions  will  make  use  rf  various 
types  of  local  features  and  will  all  >w  more  than 
two  classes. 


LOCAL  AND  GLOBAL  SHAPE  ANALYSIS 

3 . 1  Corner  detection 

Several  types  of  operators  have  been  developed 
that  respond  to  the  presence  of  "corners”  (i.e., 
sharp  changes  in  edge  direction)  ir.  an  unsegmented 
image  [13] .  For  example,  one  can  express  the  rate 
of  change  in  the  gradient  direction  in  terms  of 
first  and  second  derivative  operators,  and  then 
approximate  these  by  difference  operators;  or  one 
can  simply  compute  a  digital  gradient  direction, 
and  estimate  its  rate  of  change  at  P  by  comparing 
it  with  the  directions  at  the  appropriate  neighbors 
of  P.  To  measure  "cornerity" ,  the  rate  of  change 
in  gradient  direction  should  be  multiplied  by  the 
grad  lent  magnitude,  since  we  are  only  interested  in 
corners  that  lie  on  edges.  Figure  4  shows  a  dis¬ 
play  of  cornerity  values  for  ^  simple  grayscale 
image;  the  results  seem  reasonable. 

3 . 2  Collin carity  and  paral lei ism 

Collinear  and  parallel  (or  "antiparallel") 
sets  of  edge  and  line  segments  are  important  ele¬ 
ments  in  the  description  of  many  types  of  scenes. 
The  following  paragraphs  describe  general-purpose 
programs  for  analyzing  collinearity  and  parallelism. 
A  more  specialized  program  that  links  edge  segments 
based  on  gray  level,  as  well  as  geometric,  criteria, 
with  application  to  the  detection  of  buildings  and 
roads  on  aerial  imagery,  is  described  in  a  separate 
paper  in  these  proceedings  [14] . 

The  "cbllinearity  strength"  of  two  segments 
depends  on  several  factors: 


(a)  The  distance  between  their  nearer  ends, 
relative  to  their  lengths 

(b)  The  angles  that  they  make  with  the  line 
joining  their  nearer  ends 

(c)  The  distance  between  their  farther  ends, 
relative  to  the  nearer-end  distance  and 
lengths, 

A  collinearity  strength  measure  based  cn  a  combina¬ 
tion  of  these  factors  gives  generally  reasonable 
results,  as  illustrated  in  Figure  5  [15], 

Collinear  segments  can  be  grouped  into 
"clusters"  based  on  their  relative  sizes  and  sepa¬ 
rations.  Several  types  of  cluster  merit  functions 
can  be  used  for  this  purpose;  a  good  figure  of 
merit  should  depend  on  both  the  segment  density  and 
the  total  segment  length  in  the  given  cluster. 
Examples  of  clusters  defined  by  maximizing  such  a 
figure  of  merit  are  given  in  Figure  6.  A  report 
on  these  experiments  is  in  preparation  [16]  . 

Segments  can  also  be  linked  based  on  paralle1 
ism  (or,  in  the  case  of  edge  segments,  antiparal¬ 
lelism:  the  dark  sides  nf  the  edges  should  face  in 
opposite  directions).  The  figure  of  merit  for  this 
linking  process  should  depend  on  the  separation  of 
the  segments,  their  lengths  and  the  amount  by  which 
they  overlap,  as  well  as  their  parallelism  (i.e., 
the  angle  between  then) .  Mutually  best  pairs  based 
on  this  merit  function  can  be  linked,  and  the  pro¬ 
cess  can  then  be  repeated  with  the  linked  pairs 
eliminated.  For  examples  of  the  results  obtained 
using  this  approach  see  [17] . 

3 . 3  The  medial  axis 

The  medial  axis  (MA)  of  a  set  S  is  defined  as 
the  set  of  centers  (and  radii)  of  the  maximal 
"disks"  contained  in  S,  or  equivalently,  as  the  set 
of  points  of  S  whose  distances  from  the  complement 
S  are  local  maxima.  It  can  be  used  as  a  compact 
representation  of  S,  and  can  also  serve  as  a  basis 
for  approximating  S  by  a  union  of  "generalized 
ribbons"  (=*  connected  arcs  of  MA  points,  with  radii 
specified  by  a  "width  function"  defined  along  each 
arc) . 

The  MA  is  sensitive  to  noise,  i.e.,  to  errors 
in  extracting  the  set  S;  thus  it  would  be  desirable 
to  define  it  directly  for  unsegmented  images.  This 
can  be  done  using  a  "gray-weighted"  concept  of  dis¬ 
tance,  but  it  is  hard  to  reconstruct  the  image  from 
such  an  MA.  Another  possibility  (the  "SPAN": 

Spatial  Fiecewise  Approximation  by  Neighborhoods) 
is  to  approximate  the  image  by  maximal  homogeneous 
disks,  but  the  approximation  process  is  computa¬ 
tionally  costly.  Still  another  alternative  is  to 
assign  an  MA  score  to  each  point  P  based  on  the 
presence  of  high  gradient  values  at  pairs  of  posi¬ 
tions  symmetrically  located  with  respect  to  P;  but 
this  process  turns  out  to  be  quite  sensitive  to 
noise. 

A  more  robust  approach  to  defining  an  MA  for 
unsegmented  images  is  based  on  a  characterization 
of  the  MA  of  a  set  S  in  terms  of  shrinking  and 
expanding  operations  performed  on  S.  Let  S 
denote  the  result  of  shrinking  S  (i.e.,  deleting 


3 


its  border)  n  rimes,  and  similarly  let  S(n) 
the  result  of  expanding  S  n  times  (s(n) 

Thor.  ^  1  l _ i  _ 


denote 

r - o  -  n  L*uica  \,o ^  ) . 

Tt?fk-|-1>  ls  ROt  hard  to  see  that  sk  (S(-k^)(l)  - 

I  so  tdl  nce  !et  °f  M  P°ints  at  distance  k  from 
S,  so  that  USk  is  the  MA.  To  generalize  this  to 

unsegmented  images,  „e  use  local  MIN  operations 
instead  of  shrinking, and  local  MAX  operations  in- 
s  ea  o  expanding;  we  ran  then  define  the  "MMMAT" 

shown"^”^  7S  rSk'  Examples  of  such  MMMATs  are 
shown  in  Figure  ;  for  further  details  see  [ 18] 

Approximations  to  the  image  can  be  reconstructed 
y  using  only  points  having  strong  MMMAT  values, 
and  k  s  that  make  strong  contributions  to  these 
values  For  examples  of  such  reconstructions  see 
Figure  8;  a  report  on  this  work  is  in  preparation. 

3 . 4  Shape  segme n  t  a  ti_on 

Various  types  of  shape  features,  such  as  pro- 

in^h10115.  1,ltrusi°ns.  can  be  detected  by  compar- 
ing  boundary  arcs  with  their  chords;  for  example, 
i  the  chord  is  much  shorter  than  the  arc,  or  if 
the  arc  does  not  lie  close  to  the  chord,  that  arc 
must  be  a  protrusion  or  intrusion.  Suppose  that  we 
measure  various  arc-chord  figures  of  merit  (e.g 
arc  length  divided  by  chord  length,  or  area  between 
arc  and  chord  divided  by  squared  chord  length)  for 
every  arc.  In  many  cases,  extrema  of  such  figures 
of  merit  correspond  to  arcs  that  are  natural 
pieces  of  the  shape,  as  illustrated  in  Figure  9. 
However  this  approach  sometimes  leads  to  segmenta- 

UtrLi  T  ar!  “f  intuitively  plausible,  since  the 
extrema  depend  only  on  (e.g.)  the  curved  slopes  at 

the  ere  endpoints,  and  not  on  the  shape  of  the  arc 
between  the  endpoints;  see  [19]. 

Work  on  shape  segmentation  using  relaxatior , 

hand!'  b6h  In  6aRl,ier  ^Ports,  is  being  extended  to 
handle  shapes  with  major  occlusions  or  missing 
parts;  the  results  will  be  described  in  a  forth¬ 
coming  report. 


HIERARCHICAL  REPRESENTATION 

1* .  1  Quadtrees  and  nex trees 

The  quadtree  algorithms  developed  on  t..is  pro- 
ject  usually  involve  locating  neighbors  of  a  given 

f r om^th"  ima86  by  searchin8  the  tree  starting 

°"  sV°rrrf rdlng  n0de-  A  Beneral  treatment 

lvi  e  f  r  Flndinf\in  quadtrees,  including  an  ana- 
Sd  in  [20re  COD,putatlonal  -sts,  can  be 

B1I.  ,.Q“aatreas  are  defi"ed  on  the  basis  of  recursive 
subdivision  into  quadrants;  they  involve  square 

a  block  ifdfh°Ur  bl°Rks  of  a  8lven  size  constitute 

it  1  V a  hf  §et  Slze‘  For  S0Dle  Purposes 

it  may  be  desirable  to  define  a  representation 

based  on  hexagonal  rather  than  square  blocks,  since 
such  a  representation  would  be  less  sensitive  to 
rotation.  Hexagons  cannot  be  combined  to  form 
exact  hexagons  of  a  larger  size,  but  one  can  com¬ 
bine  seven  hexagons  into  a  "ragged"  hexagon,  and 
this  process  can  be  iterated,  as  illustrated  in 
Figure  10.  a  detailed  discussion  of  how  to  define 
hexagonal  pyramids"  in  this  way  can  be  found  in[21]. 


14 '  “■  tJL££_ghape  approximation 

When  a  region  is  represented  bv  a  quadtree 
the  upper  levels  of  the  tree,  corresponding  to 
large  blocks  of  the  image,  define  approximations 
to  the  region.  These  approximations  can  be  used  to 
estimate  shape  properties  such  as  moments,  and  to 
speed  up  shape  matching  by  eliminati  gross  mis- 
matches  rapidly  [22].  For  example,  th^oortiite. 
the  centroid  of  a  shape  can  be  estimated  to  a 

sla?i!°n.°f  3  pixel  usin8  quadtree  approximations, 
as  illustrated  in  Figure  11.  This  should  make  it 
poss  ble  to  track  moving  shapes  quite  accurately 
even  theough  the  quadtree  itself  changes  radically 
when  a  shape  is  shifted,  the  moment  approximations 
emain  stable.  Similarly,  the  approximations  can 
e  used  to  determine  upper  and  lower  bounds  on  the 
mismatch  area;  thus  if  we  are  matching  a  given 
s  ape  Sj  against  a  collection  of  stored  shapes 
,1’  2 >•••>  we  can  eliminate  any  S*  such  that  the 
lower  bound  on  the  mismatc!  of  S,  with  Si  exceeds 
the  upper  bound  on  the  mismatch  of  Si  with  some 
other  shape.  This  error  bounding  process  is  illu- 
strated  in  Figure  12  . 

3  Hierarchical  image  processing 
and  segmentation 

Extensive  work  is  now  in  progress  on  the  use 

S^uctures  for  image  processing  and  seg¬ 
mentation.  This  work  is  summarized  in  a  separate8 
paper  in  these  Proceedings  [23].  The  following  are 
some  of  the  chief  areas  of  investigation: 


a) 


b) 


c) 


Iterated  local  convoj  ition  operations  can 
be  used  to  produce  large-kernel  convolu¬ 
tions  having  almost  exactly  Gaussian  ker¬ 
nels.  These  can  in  turn  be  combined  to 
yield  various  types  of  circular  or  elon¬ 
gated  center-surround  operators  [24]. 

Image  pyramids  can  be  defined  in  which  the 
blocks  at  each  level  overlap;  this  largely 
negates  the  objections  to  conventional 
power-of-2  pyramids  on  grounds  of  shift 
sensitivity. 

In  an  overlapped  pyramid,  by  associating 
nodes  with  their  most  similar  ancestors, 
one  can  establish  linked  clusters  of  nodes 
representing  homogeneous  regions;  this 
facilitates  smoothing  or  segmentation  of 
the  regions. 

Local  operations  in  a  pyramid  can  be  used 
to  detect  simple  types  of  objects  in  the 
image,  and  to  extract  these  objects  by 
local  thresholding.  This  approach  was 
applied  to  blob-like  objects  in  an  earlier 
report;  it  has  now  been  extended  to 
streak-like  objects  [25]. 

Pyramids  can  be  used  to  define  quadtree 
approximations  to  an  image  ("Q-images") , 
based  on  the  concept  that  a  block  is  sub¬ 
divided  only  if  it  is  unhomogeneous . 


f)  The  use  of  Q-lmages  facilitates  segmenta¬ 
tion  by  thresholding,  since  the  peaks  in 
the  histogram  of  a  Q-image  (where  each 
block  contributes  its  mean  gray  level,  a 
number  of  times  proportional  to  its  size) 
tend  to  be  sharper  and  more  cleanly  separa¬ 
ted.  The  histogram  is  further  improved 
when  we  eliminate  small  blocks,  since  these 

°n  reglon  b°rders.  Conversely, 
if  we  histogram  only  the  small  blocks,  we 
obtain  a  unimodal  histogram  whose  mean  is 
a  good  threshold  [26],  More  generally 
ve  can  find  blocks  in  the  quadtree  corre¬ 
sponding  to  peaks  in  the  histogram,  and 
apply  local  thresholds  in  the  vicinity  of 

rpM6  t0  extract  the  appropriate 

regions  [27]. 

g)  Q-images  can  also  be  used  to  improve.*  ^ ge 
election,  based  on  establishing  corrt-pon- 
dences  between  edges  in  the  Q-image  anc 
edges  in  the  original  image  [28]  . 


REFERENCES 

1.  Semi-annual  report,  1  April-30  Seotember  1978. 

2‘  ®^'annual  report,  1  October  1978-30  March 

3.  Semi-annual  report,  1  April-30  September  1979. 

4.  Project  status  report,  in  Proceedings,  Image 
bnderstanding  Workshop,  Novelet.  197? ,  20-27 

S'  nnHJeC!:  Sa?tUS.rep0rt»  ln  Proceedings ,  Image 
bnderstanding  Workshop,  April  1979,  14-24. 

6'  nral6Ct  S,tatUS  reP°rt’  in  Proceedings,  Image 
n  erstanding  Workshop,  November  1979,  166-175. 

L.  Kitchen,  Non-maximum  suppression  of  gradient 
magnitudes  makes  them  easier  to  threshold,  in 
L.  Kitchen,  A.  Broder,  and  A.  Rosenfeld,  Two 
notes  on  digital  edges  and  lines,  TR-885 
Computer  Vision  Laboratory,  Computer  Science 
center,  University  of  Maryland,  College  Park 
MD,  March  1980.  8  ™rk> 

8.  D.  i  .  Morgenthaler  and  A.  Rosenfeld,  Multidi- 

tine ^l0"f8778erdeteCtl°n  by  hyP^8urface  fit- 
ting,  TR-877 ,  Computer  Vision  Laboratory,  Com- 

Sc^en^e  ^nter,  University  of  Maryland, 

College  Park,  MD,  February  1980. 

9.  R.  C.  Smith  A  general-purpose  software  package 
for  array  relaxation,  TR-839,  Computer  Vision 
Laboratory  Computer  Science  Center,  University 
of  Maryland,  College  Park,  MD,  December  1979. 

10.  PA.  Dondes  and  A.  Rosenfeld,  Pixel  classifi- 

™  fm  r  °r  ^  l6Vel  and  local  "busyness", 

TR-874,  Computer  Vision  Laboratory,  Computer 

p  1!nC^CenteV'’  Unlversity  °f  Maryland,  College 
Park,  MD,  March  1980.  ege 

11.  A.  Rosenfeld,  Cooperative  computation  in  tex-  24. 

ture  analysis,  in  Proceedings,  Image  Under- 

standing  Workshop,  November  1979,  52-56 


.  H.  Hong,  A.  Y.  Wu,  and  A.  Rosenfeld,  Fea¬ 
ture  value  smoothing  as  an  aid  in  texture 
analysis,  TR-844,  Computer  Vision  Laboratory, 
omputer  Science  Center,  University  of  Mary¬ 
land,  College  Park,  MD,  December  197P. 

13.  L.  Kitchen  and  A.  Rosenfeld,  Crav  level  corner 
detection,  TR-887,  Computer  Vision  Laboratory 
Computer  Science  Center,  University  of 
Maryland,  College  Park,  MD,  March  1980. 

14.  M.  Tavakoli,  Toward  *he  recognition  of  cul- 

stlnHi  uUf!SJ  ln  "rpceedlng«.  Image  Under- 
standing  Workshop,  April  1980. 

15.  A.  Broder  and  A.  Rosenfeld,  A  note  on  col  li¬ 
nearity  merit,  in  L.  Kitchen,  A.  Broder,  and 
A.  Rosenfeld,  Two  notes  of  digital  edges  and 
lines,  TR-885,  Computer  Vision  Laboratory 
computer  Science  Center,  University  of 
Maryland,  College  Park,  MD,  March  1980. 

16.  A  Scher,  M.  Shneier,  and  A.  Rosenfeld, 

c  “®tarlnf;  °f  c°Hinear  line  segments,  TR-888, 

Centerern  J  Lab°ratory>  Computer  Science 
Center,  University  of  Maryland,  College  Park 
MD,  April  1980.  ge  Park, 

1/.  A.  Scher,  M.  Shneier,  and  A.  Rosenfeld,  A 
method  for  finding  pairs  of  anti-parallel 
ttraigat  Unes,  XR-845,  Computer  Vision  Labo- 
ratory  Computer  Science  Center,  University  of 
Maryland,  College  Park,  MD,  December  1979. 

18.  S.  Peleg  and  A.  Rosenfeld,  A  min-max  medial 
axis  tram  ormation,  TR-856,  Computer  Vision 
v°ra V  ?OI"PUter  Sclence  Center,  Univer- 
1980  f  Maryland’  Colle8e  Park,  MD,  January 

^9.  W.  S.  Rutkowski,  Shape  segmentation  using 

arc/chord  properties,  TP.-849,  Computer  Vision 
Laboratory,  Computer  Science  Center,  Univer¬ 
sity  of  Maryland,  College  Park,  MD,  December 

20.  H.  Samet,  Neighbor  finding  techniques  for 
images  represented  by  quadtrees,  TR-857 
Computer  Vis ion  Laboratory,  Computer  Science 
Center.  University  of  Maryland,  College  Park 
MD,  January  1980.  ’ 

21  ‘  P'  3.‘  Bprt>  Tree  and  Pyramid  structuies  for 
coding  hexagonally  sampled  binary  images, 

TR-SH,  Computer  Vision  Laboratory,  Computer 
Science  Center,  University  of  Maryland, 

Colxege  Park,  MD,  October  1979. 

22.  S.  Ranade,  A.  Rosenfeld,  and  H.  Samet,  Shape 

rrvrr\0VSln8  quadtrees,  TR-847,  Compu¬ 
ter  Vision  Laboratory,  Computer  Science 
Center,  University  of  Maryland,  College  Park, 

MD,  December  1979.  * 


23. 


A.  Rosenfeld,  Some  uses  of  pyramids  in  image 
processing  and  segmentation,  Proceedings, 
Image  Understanding  Workshop,  April  1980. 

P.  J.  Burt  Fast,  hierarchical  correlations 
with  Gaussian-like  kernels,  TR-860,  Computer 
Vision  Laboratory,  Computer  Science  Center 
University  of  Maryland,  College  Park,  MD 
January  1980. 


M.  Shneier,  Extracting  linear  features  from 
images  using  pyramids,  TR-855,  Computer  Vision 
Laboratory,  Computer  Science  Center,  Univer¬ 
sity  of  Maryland,  College  Park,  MD,  January 
1980. 

A.  Y.  Wu,  T.-H.  Hong,  and  A.  Rosenfeld, 
Threshold  selection  using  quadtrees,  TR-886, 
Computer  Vision  Laboratory,  Computer  Science 
Center,  University  of  Maryland,  College  Park, 
MD,  March  1980. 

S.  Ranade,  A.  Rosenfeld,  £nd  J.  M.  S.  Prewitt, 
Use  of  quadtrees  for  image  segmentation, 
TR-878,  Computer  Vision  Laboratory,  Computer 
Science  Center,  University  of  Maryland, 

College  Park,  MD,  February  1980. 

S.  Ranade,  Use  of  quadtrees  for  edge  enhance¬ 
ment,  TU-862,  Computer  Vision  Laboratory, 
Computer  Science  Center,  University  of  Mary¬ 
land,  College  Park,  MD,  February  1980. 


Figure  1.  Nonmaximum  suppression  as  an  aid  in  edge  detection.  (a)  Image;  (b)  digital  (Sobel) 
gradient  magnitudes;  (c)  results  of  suppressing  nonmaxima  in  the  gradient  direction; 
(d)  results  of  thresholding  (b)  at  6;  (e)  histograms  of  (b)  and  (c)  superimposed. 


a 

d 


Figure  2.  Surface  detection  In  3-d  arrays. 

(a-c)  Three  consecutive  cross-sections 
of  a  CT  recons tru*  cion;  (d)  results  of 
applying  the  2— d  Prewitt  operator  to 
the  middle  cross-section;  (e)  results 
of  applying  a  3-d  Prewitt  operator  to 
the  three  cross-sections. 


Fraction  of  class 
correctly  classified 


Fraction  c 

lass  if ied 

as 

Class 

1 

2 

3 

A 

5 

J 

.925 

.003 

.023 

.003 

.  0A7 

2 

.011 

.938 

.028 

.002 

.021 

3 

.  11A 

.002 

.874 

.000 

.011 

A 

.019 

.006 

.003 

.057 

.  91A 

3 

.083 

.005 

.008 

.025 

.876 

Iteration  of 
relaxat ion 

1 

2 

3 

■4 

s 

0 

.925 

.938 

.87* 

.057 

.876 

1 

.933 

.920 

.871 

.063 

.872 

2 

.933 

.921 

.875 

.067 

.875 

3 

.933 

.921 

.876 

.072 

.874 

A 

.935 

.921 

.878 

.  157 

.871 

5 

.937 

.921 

.880 

.330 

.  84  1 

6 

.937 

.921 

.881 

.413 

.824 

7 

.938 

.921 

.881 

.  46  5 

.820 

8 

.938 

.921 

.881 

.500 

.815 

Iteration 
of  medium 
filtering 

fraction  or  class 

correctly  classified 

1 

2 

3 

A 

5 

0 

.925 

.938 

.  87A 

.057 

.876 

1 

.897 

,937 

.915 

.535 

.866 

1  2 

.900 

.  940 

.921 

.536 

.866 

3 

.901 

.939 

.921 

.  53A 

.867 

A 

.903 

.937 

.922 

.531 

.872 

Figure  3.  Segmentation  based  on  gray  level  and  local  "busyness” .  (a)  Image  (bottom)  and  nand  sefc- 

mentat ion (tcp) .  (b)  Confusion  macrix  for  max imum- likelihood  classification  of  the 

pixels  into  sky,  brick,  grass,  bush  and  shadow,  based  on  bivariate  Gaussian  fitting 
to  the  populations  obtained  by  hand  segmentation;  note  that  the  shadow  class  is 
mostly  classified  as  bush.  (c)  Results  of  applying  probablistic  relaxation  to  initial 
classifications  based  on  Gaussian  fitting;  note  the  gradual  improvement.  (d)  Results 
of  smoothing  the  busyness  values  by  median  filtering  prior  to  clustering  and  classifica¬ 
tion;  the  improvement  is  immediate. 


\ 


Figure  A.  Comer  detection  in  grayscale  images,  (a)  Image;  (t)  results  of  "comerity"  computation. 


9 


I 


Figure  6. 


-  ■» «»« »  *” 

linked . 


Figure  7. 


The  min-max  medial  axis  transformation  igure 

(MMMAT) .  (a)  Images;  (b)  MMMATs. 


Reconstruction  from  the  MMMAT. 

(a)  Original  images;  (b-d)  reconstruc¬ 
tions  from  the  one,  two,  and  three 
largest  increments  at  those  points 
having  values  above  the  25th  percentile 
(189,  582,  226,  and  462  out  o£  4096 
pixels,  in  the  four  cases). 


12 


(a) 


(b) :  Level 


2 


Level 

No.  of  nodes 

Area 

Centroid  coordinates 

2 

15 

240 

35.63,33.50 

1 

55 

392 

34.64,32.38 

0 

155 

494 

34.17,32.36 

1 


0 


Figure  11.  Approximating  the  centroid  of  a  shape  using  its  quadtree  representation.  (a)  Airplane. 

(b)  Black  nodes  at  each  level  of  the  quadtree  representation  of  (a) ,  displayed  as  black 
blocks,  (c)  Cumulative  number  of  nodes,  area,  and  centroid  coordinates  as  a  function  of 
level . 


a  to  a 
L  LD  UD 


-f-  f 


1 


1  0  4096 

2  0  2048 

3  0  1152 

4  0  608 

5  0  212 

6  0  0 

b  to  b 

1  0  4096 

2  0  2304 

3  0  1216 

4  0  592 

5  0  200 

6  0  0 


(a)  (b) 


a  to  b 

1  0  4096 

2  0  2816 

3  0  1600 

4  352  1392 

5  504  900 

6  601  601 


(c) 

Figure  12.  Lower  and  upper  bounds  on  the  mismatch  when  two  airplareu  are  matched  to  themselves  and  to 

each  other  (L=level,  LB=lower  bound,  UB=upper  bound).  Note  that  at  level  5,  the  lower  bound 
on  the  mismatch  to  each  other  exceeds  the  upper  bounds  on  the  mismatches  to  themselves. 


13 


IMAGE  UNDERSTANDING  RESEARCH  AT  CMU: 
A  Progress  Report 

John  R  Kender  and  Raj  Reddy 
Department  of  Computer  Science 
Carnegie-Melion  University 
Pittsburgh.  PA  15213 


INTRODUCTION 

Our  efforts  this  part  half-year  have  boon  to  continue 
to  develop  an  integrated  demonstration  system,  and  the 
image  processing  techniques  that  such  a  system  implies.  We 
can  report  progress  on  several  fronts  this  pa -t  si*  months: 
advances  in  the  theory  and  implementation  of  low  and 
middle-level  image  processing  algorithms,  basic  syslem 
hardware  enhancements,  and  roflware  systems  developmenl 
For  much  of  this  period  wc  have  bcerf  concentrating  on 
creating  and  readapting  algorithms  for  the  VAX  UNIX 
environment. 

We  continue  to  focus  our  efforts  on  three  major  tasks 
areas..  The  first  concerns  the  understanding  of  two- 
dimensional  aerial  and  satellite  images  of  the  Washington, 
D.C.,  area.  We  are  examining  cost-effective  methods  of 
matching  the  visual  input  against  the  symbolic  map.  Major 
effort  has  been  applied  to  the  difficult  problems  that  occur 
when  resolution  is  very  good,  including  the  modeling  of 
buildings  and  the  problems  of  shadows  Work  on  the  three- 
dimensional  analogue  of  this  task— the  understanding  of 
color  images  of  downtown  Pittsburgh— has  lead  to  further 
exploration  of  the  relationships  of  surface  texture  to 
surface  orientation.  Development  of  high-speed  convolution 
algorithms,  applicable  to  both  domains,  has  also  started. 

Our  second  area  of  concern  is  the  devetopment  and 
evaluation  of  computer  architectures  for  computer  vision, 
Further  work  in  this  area  awaits  ihc  delivery  of  the  SPARC 
ultra-high  speed  signal  processing  computer  (Allen,  1979), 
and  the  installation  of  the  Programmable  Sum  of  Products 
Operator  chip  (Eversole,  et.  at.,  1930).  Both  are  being 
developed  on  subcontract,  to  Control  Data  and  Texas 
Instruments,  respectively.  We  anticipate  that  a  fair  amount 
of  software  development  will  occur  on  their  arrival. 

It  is  in  our  third  task  area  that  we  have  devoted  much 
of  our  energies:  the  development  of  user-friendly 
interactive  aids  for  image  processing  applications  In 
addition  to  continuing  work  on  our  image  data-base 
management  programs  (McKeown,  1979),  we  have  written 
and  modified  many  programs  for  the  new  hardware  and 
software  environment  that  a  UNIX  VAX  with  frame  buffer 
provides.  Some  of  our  code  should  also  make  the  IUS 
testbed  facility  a  more  efficient  environment. 

The  following  is  a  more  detailed  discussion  of  our 
recent  work. 


SYSTEMS 

Software  development  has  spanned  many  levels  Of 
software  needs.  Wc  have  worked  on  the  facilities  that  are 
necessary  for  interprocess  communication  lo  be  added  lo 
UNIX,  this  wilt  aid  the  testbed.  Tins  has  required  substantial 
modrf ic ations  and  additions  to  Ihe  C  language  and  ils  manual. 

A  bit  higher,  we  have  defined  the  necessary  new 
image  format,  which  more  efficiently  groups  pixels  into 
blocks  instead  of  row  We  have  coded  the  support  and 
library  routines  that  allow  us  to  easily  manipulate  images. 
Although  a  given  image  can  be  hardware  paged,  we  have 
provided  programs  to  altow  them  lo  bo  software  paged,  for 
convenience  and  efficiency. 

Our  Grinnell  frame  buffer  has  been  installed.  We  have 
begun  to  convert  many  of  Our  sy  tern-level  and  user-level 
software  from  our  multi-proccr ror  Image  Understanding 
System  to  its  m.  w  host,  the  UNIX  VAX  The  franc  buffer  wit! 
provide  us  with  rnuch-needed  flexibility  in  the  display  of 
cultural  data  superimposed  upon  raw  imagery;  it  also 
contains  some  local  image  processing  capabilities.  Wc  are 
implementing  a  general-purpose  user-level  subroutine 
package  for  the  frame  buffer,  including  a  system  for 
generating  the  maps  for  its  color  mapping  hardware. 

User -level  facilities  under  development  include 
programs  to  display  multiple  images  simullancqusly,  and 
programs  for  flexible  menu  display.  Many  low-level  image 
operators  and  routines  have  been  moved  and  adapted. 
Specific  algorithms  newly  being  brought  up  for  the  task 
domains  are  found  in  the  next  section. 

At  the  level  of  existing  full  user  systems,  we  have 
adapted  the  BROWSE  information  management  system  (Fox, 
et,  ?J,,  1979)  to  the  VAX,  as  one  of  our  interactive  aids.  The 
KIWi  color-image  segmentation  system  (Shafer,  1980),  also 
highly  interactive  (as  well  as  automatic)  is  in  the  process  of 
being  moved,  too. 

And,  of  course,  we  have  reformated  and  copied  much 
of  our  large  library  of  images. 

TASKS 

Our  efforts  in  the  task  domains  have  been  both 
practical  and  theoretic. 


14 


Two  separate  efforts  have  explored  algorithms  lo 
handle  liigh-rcsolut.on  aerial  image,  y  The  fust  ant,,, pates 
a  r-ystem  in  winch  a  symbol, c  e  accurate  lo  (he  bu.ldin" 
level;  the  signal  ,s  matched  lo  the  symbols.  Current  design 
employe,  an  .mage  segmentation  -ter,  which  will  be  followed 
by  a  chamfer-based  matching  step  (Barrow,  el.  at.,  19/7). 

ic  segmenter  is  a  lype  of  region  .-rower.  ||  appear',  that 
"  may  bc  c'v'l0r  lo  embed  any  hour, she  knowledge 
necc'.t.ary  (or  lint,  lark  .nlo  such  a  segmenter;  further  "it 
occmo  it  would  he  earner  lo  integrate  such  a  segmenter  into 
the  matching  step,  if  necessary. 

Initial  de..ign  ol  the  segmenter  wa'.  similar  lo  that  of 
Nagao,  Matsuyama,  and  Ikeda  (Nagao,  el  at.,  197S) 
However,  it  war.  necessary  to  augment  and  rewrite’ portions 
of  the  algorithm,  apparently  due  lo  difference',  in  resolution 
and  our  lack  of  mulli'peclral  information  Several  low-level 
Operations.  weic  enhanced  and  tuned.  Additionally 
heuristic were  added  to  guide  the  '.egmenlal.on' 
typcrimeot  Alton  continues 

The  second  effort  trikes  a  somewhat  different 
approach;  it  is  line-based,  and  is  intended  to  help  build  up 
I  he  symbolic  map  In  thin  task,  we  have  focused  on  the 
problem  of  extracting  highly  accurate  line  boundaries  of 
buildings.  A',  par!  of  this  system,  we  have  exper, monied 
with  various  ways  of  obtaining  and  refining  edge  profiles 
We  believe  that  an  understanding  of  profile’;  may  load  to  a 
new,  efficient,  and  highly  accurate  method  of  locating  and 
tracking  extended  line  segment  ..  W<-  anticipate  that  the  use 
Ol  the  liocs  themselves  will  also  lead  lo  efficient 
represent alioos.  This  work  intends  to  interactively  build 
model  descriptions  of  a  city,  directly  from  images  contained 
m  our  database.  Parallel  efforts  in  using  line  descriptions 
for  image-to  map  matchiog  have  also  begun. 

Continuing  research  mm  (lie  relations  of  textures  to 
surface  Orientations  has  produced  more  theoretic  results 
concerning  the  physical  imaging  process,  the  representation 
of  surface  orientation,  and  the  problems  of  representing 
surfaces  themselves  Wonder,  19S0).  (Additionally,  some 
aspects  of  the  theory  have  lieeo  implemented  aod  arc  being 
explored.)  0 

A  general  method  has  been  developed  lo  (ind  the 
camera  parameters  of  focal  length  and  focal  point  directly 
from  features  in  the  ima^c.  These  parameters  corrcrpt>nd 
roughly  to  the  sue  ol  the  lens  and  to  the  location  ol  the 
center  of  the  image  before  it  was  cropped.  Until  now  these 
parameters  must  have  been  known  a  priori  before  any 
image  processing  requiring  absolute  position  could  homo 
Like  all  imago  understanding  tasks,  the  method  requires 
certain  assumptions  o(  the  image;  the  method  works  best  oo 
scenes  with  sharp  aoglos.  The  method  suggests  that 
topological  rclatioos  between  surfaces  arc’  far  more 
important  that  exact  ones;  indirectly,  it  helps  explain  both 
Inc  success  And  incvtcnsibilily  of  tine-labeling  schemes. 

Additional  work  has  revealed  some  insight  ioto  why 
the  ground  plane  is  so  difficult  lo  find  in  a  ^.iogle  image 
under  the  conditions  or  assumptions  o(  orthography.  A  lea’s, 
in  one  common  instance,  the  problem  of  Ncckcr  reversal  :y 
compounded  by  a  second  ambiguity  that  leads  to  siluatioos 
reminiscent  of  the  drawings  of  Escher. 


9-nrc  texture.  are  often  sparse,  Imding  and 
representing  the  su  f  ’ms  they  delineate  is  a  difficult  task. 

i  Ofteo  that  case  lliat  a  texture  is  largely  transparent- 
consider  a  window  screen,  lor  r.a.  ,p|e  Representing  such 
textures  ,  dilfirnll,  since  it  implies  that  the  imam  d  uf  l0 

multivalued  as  far  a-  distances  and  oiientalions  at  a  . on 

P'xcl  are  concerned.  However,  the  pre  „  c,c  ,-ly 
relaled  lo  the  problem-,  of  occlusions.  .,nd  even  of  shadows 
(which  often  are  common,  bul  rarely  air  total). 

A  second  area  of  Ihoniolic  endeavor  (under  OT.P 
support)  is  exploring  three  dimensional  .i.tom.ilic  com-pt 
formal, on.  The  goal  i-  a  pro  ram  that  ,•  ,;,Ven  a  set' of 
examples  and  non-example  of  a  concept  representing  a 
three-dimensional  object  (such  a-  a  chair),  from  these  it 
generates  a  tin  ee-ciimen-.inn.il  model  ol  the  concept.  The 
approach  describes  objects  as  combination:  of  generalized 
cylinders.  Carefully  cho-.cn  properties  ol  each  object 
description  arc  extracted,  and  compared  with  other  ob.cct- 
to  determine  how  best  to  describe  the  concept  model 

them  dl!,.many,  i5M*s  ,ha'  s‘'«  lo  be  studied,  among 

them  devising  hcur.shcs  (or  choo-ung  promis.ng  features, 

and  for  merging  features  into  good  models 

Lastly,  we  arc  exploring  technique  lo  decompose 
convolution  mask  into  sequentially  applied  subms-.ks  that 
arc  more  cost  effective.  We  hope  matrix  dec ompov.i boos 
w,l  allow  US  10  quickly,  but  accurately,  apply  operators  or 
match  templates  lo  our  images;  these  expensive  operations 
arc  required  in  all  of  our  task  do  -lams. 

REFERENCES 

G.  R  Alien,  P.  G.  Juctten,  “SPARC-Gymbcl.c  Procc^.nn 
Algorithm  Research  Computer,"  fVorcrdmrj,  of  Ihc 
ARPA  Im.ior  Under  landing  Workshop,  Science 
Applic  itioiv,  Inc  .,  Apr ,  19/9. 

H  G.  Barrow,  J.  M  Teneobat.m,  ft.  C.  Doties,  and  II.  C.  Wolf, 
Parametric  Correspondence  and  Chamfer  Malihin->; 
Two  New  Techniques  fn.  Ini, a;;.  Matching,"  Proceed!,,.  - 

of  the  Inlornatioq.il  Joint  Cqp.fpr.ente  on  "'Artificial 
InlolliPonrr,  ]  977. 

w.  L.  Ever  sole,  and  D.  J.  Mayer,  “Programmable  Sum  of 
Products  Operator,"  in  this  volume. 

M.  S.  I  ox,  and  A.  J.  Palay,  “The  BROWSE  System:  An 
Introduction,"  Proceedings  of  (he  Annual  Conference  ol 
—  — Ccric  -.'.Q  vOOrdy  [or  Infoirnatioo  Science.  1979  * 

J.  R.  Render,  "Snape  (rom  Texture,"  PhD.  Thesis,  Computer 
Science  Ocpt.,  Carncgic-Mcllon  Univ.,  1930  (io 
preparation). 

D.  M.  McKeown,  "Representation-,  (or  Ima-e  Data  Base-" 
Exceeding-.  of  Ihc  ARPA  linage  Understanding 
Workshop,  Science  Applicationc  Inc.,  Nov.,  1979. 

M.  Nagao,  T.  Matsuyama,  and  M.  Ikeda,  "Region  Extraction 
flnd  Shape  Analysis  of  Aerial  Pholo-’raplis " 
Erocecdings  of  the  International  Joint  Conference  on 
Pattern  Rcconnition,  1978. 

S.  Shafer,  and  T.  Kanade,  "KIWI:  A  Flexible  System  for 
Region  Segmentation,"  Technical  Report,  Carnegie- 
Mellon  UiiiV,,  19S0  (in  preparation). 


15 


UNDERSTANDING  IMAGES  AT  MIT: 
Rl  I’KESENTATD  I  PROGRESS 

Kent  Stevens,  Keith  Nishihara, 

Hrian  Schunck,  and  the  Staff 

The  \rt ifieial  Intelligence  laboratory 
Massachusetts  Institute  of  Technology 


In  this  series  of  image  understanding  conference  proceedings,  we  have 
stressed  the  issue  of  representation  In  particular,  we  have  described  the 
development  by  Horn  and  his  collaborators  of  iht  reflectance  map,  and 
the  albedo  image  (in  working  with  satellite  images),  and  we  have 
described  the  work  of  Man  and  his  group  on  the  primal  sketch,  the  2 
I/2-D  sketch,  and  axis* based  3-D  models  as  part  of  a  comprehensive 
theory >  of  recognition. 

In  the  November,  19?9  Proceedings,  we  reviewed  our  contributions  to 
the  design  of  adequate  representations  and  enumt  rated  techniques  that  we 
have  devised  to  exploit  them. 

Here  we  review  work  by  Horn's  group  on  optical  fow  and  work  by 
Marr's  group  concerning  zero-crossings,  stereo,  stereo  hardware,  and  the 
contributions  of  texture  gradients  to  the  2  1/2- D  sketch. 

Zero-Crossings  and  the  Primal  Sketch 

Marr's  group  has  devoted  considerable  attention  to  the  theory  of  the 
raw  primal  sketch  of  the  image,  a  primitive  description  of  the  intensity 
changes  in  terms  of  blohs.  bars,  edges,  and  terminations,  which  are 
characterized  by  position,  orientation,  contrast,  and  size.  (Ihc  full 
primal  sketch  later  emerges  when  local  geometric  relations  arc  made 
explicit  along  with  larger,  more  abstract  descriptions  of  groupings, 
aggregations,  and  summarizing  descriptions,  c.g.,  of  texture.  Man- 
provided  the  foundations  for  the  full  primal  sketch  by  specifying  a 
number  of  grouping  operations  such  as  the  so-called  theta  aggregation. 
Here  wc  will  report  on  recent  progress  on  the  earlier,  raw  primal 
sketch.) 

Computing  the  raw  primal  sketch  falls  naturally  into  two 
parts:  (i)  the  intensity  changes  at  a  set  of  different  scales  arc  first 
computed  since  intensity  changes  occur  in  natural  images  over  a  wide 
range  of  scales,  and  (ii)  the  descriptions  that  arise  from  these 
independent  channels  arc  then  combined  into  the  raw  primal  sketch  of 
the  image. 

The  use  of  multi  pie- scale  zero-crossings  in  images  filtered  by 
convolution  with  the  Laplacian  of  a  two-dimensional  Gaussian  (V2G) 
is  now  a  central  component  of  early  vision  computations  [Marr  & 
Poggio  1978;  Marr,  Poggio  &  Hildreth  1979].  This  reflects  two 


underlying  requirements,  the  need  to  separate  information  in  an  image 
according  to  its  scale,  and  to  identify  fixed  locations  on  viewed  surfaces. 
Peaks  in  the  rate  of  intensity  change  correlate  well  with  physical 
locations  at  the  smallest  scale  and  these  peaks  correspond  lo  zeros  in  the 
Laplacian.  Similar  information  at  other  scales  can  be  obtained  by  first 
convolving  the  image  with  a  Gaussian  having  a  suitable  space  constant. 
Gaussian  convolution  has  the  property  of  removing  high  frequency 
information  while  preserving  the  local  geometric  structure  of  larger 
scale  variations  in  intensity.  The  two  steps.  Gaussian  convolution 
followed  by  the  Laplacian.  can  be  combined  into  a  single  operation  of 
convolution  with  the  laplacian  of  the  Gaussian,  since 
V  (G*l)  =  V~G*  I  V?G  fi  tered  images  arc  essentially  the  same  as 
the  difference  of  Gaussian  fi  tered  channels  observed  in  the  human 
visual  system  |M?rr  &  Hildreth  1979], 

The  zero-crossings  ,irc  represented  by  a  set  of  oriented 
primitives  called  zero-crossing  segments,  each  describing  a  piece  of  the 
contour  whose  intensity  slope  (rate  at  which  the  convolution  changes 
across  the  segment)  and  local  orientation  is  roughly  uniform.  Small, 
closed  contours  arc  represented  as  blobs,  also  with  an  associated 
orientation,  average  intensity  slope  and  size  defined  by  their  extent 
along  a  major  and  minor  axis. 

Zero-crossings  and  Sampling  Theorems 

Continuing  work  on  Marr  and  Poggkfs  stereo  theory  has  led  to  several 
computational  results  concerning  (i)  the  relationship  between  mask  size 
and  resolving  power  at  the  level  of  zero- crossings,  (ii)  the  sampling 
interval  necessary  to  localize  those  zero-crossings  reliably,  and  (iii)  the 
sufficiency  of  the  slopes  at  zero-crossings  as  a  representation  of  V^G 
filtered  images.  The  first  two  parts  of  tin's  work  started  out  as  questions 
.about  the  human  visual  system.  Psychophysical  experimentation  has 
revealed  the  remarkable  capabilities  of  the  human  visual  system  to 
resolve  fine  detail.  Marr,  Poggio.  and  Hildreth  [1979]  have  shown  that 
if  ihn  limit  of  resolution  is  determined  by  the  smallest  separation  at 
which  distinct  zero- crossing  contours  can  be  obtained  between  two  dots 
or  lines,  th  en  the  V2G  mask  of  the  smallest  channel  in  the  visual  system 
must  have  a  central  excitatory  diameter  (w)  of  at  most  1.5  minutes  of 
arc.  This  is  about  an  octave  smaller  than  the  smallest  channel  (w-4.38 


16 


minutes)  measured  psychoph>sicaliy  b>  Wilson  and  Mergen  [1979] 
Tins  fifth  channel  is  also  consistent  with  the  optical  limitations  of  the 
eye  which  place  a  theoretical  limit  on  resolution,  and  with  known 
physiological  data  concerning  midget  ganglion  cells  in  the  retina  which 
are  believed  to  be  driven  by  a  single  cone  cell. 

There  is  a  second  type  of  visual  acuity,  called  hyperacuity, 
which  refers  to  our  ability  to  make  accurate  udgments  requirng  the 
localization  of  some  visual  feature  to  a  resolution  of  a  few  seconds  of 
arc,  rough, y  one  fifth  the  diameter  of  tlv'  smallest  foveal  cone  cells 
[Wcsthcimer  1976].  ihe  input  to  the  visual  cortex  has  a  sampling 
interval  of  about  1  minute  of  am  which  is  even  coarser  than  the  initial 
cone  spacing.  To  explain  tins  Marr,  Eoggio  and  Hildreth  [1979]  and 
Crick,  Marr  and  Poggio  [1980]  have  shown  that  zero-crossings  can  be 
localized  to  within  a  few  seconds  of  arc  by  straightforward  interpolation 
between  values  at  the  sample  points. 

These  results  have  relevance  to  both  the  study  of  human 
vision  and  the  development  of  practical  machine  vision  systems.  To 
make  the  propc  Tcs  of  V2G  filtered  images  more  precise,  Keith 
Nishihara  has  been  studying  the  constraints  placed  on  V2G  filtered 
images  by  boundary  conditions  at  (i)  their  zero-crossings  and  (li) 
regularly  spaed  sample  points.  The  objective  of  the  first  is  lo 
determine  the  degree  to  which  the  slopes  at  the  zero -crossings 
determine  the  overall  filtered  signal.  It  is  important  to  understand  how 
di ffcicnt  two  filtered  signals  cin  be  and  still  satisfy  the  same 
zero-crossing  boundary  conditions.  The  nature  of  V2G  filtering  does 
not  allow  a  uniqueness  theorem  such  as  1  og.  n's  [1  ogan  1977]. 
Nevcihclcss,  it  is  possible  to  show  that  the  difference  between  two 
filtered  images  which  have  the  same  zero-crossings  and  the  same  slopes 
at  those  zero-crossings  is  bounded  in  a  useful  way  which  depends  on  (i) 
the  smallest  distance  between  zero-crossings  at  the  point  in  question 
and  (ii)  the  range  of  magnitudes  allowed  in  the  original  images.  These 
results  arc  consistent  with,  and  strengthen,  the  earlier  empirical  results 
by  Nishihara  which  showed  that  a  good  approximation  to  a 
filtered  ima^j  can  be  reconstructed  from  just  the  slopes  at  its 
>ero-c rowings  (Winston  1979].  The  techniques  used  to  obtain  these 
results  cun  also  be  used  to  bound  the  difference  between  two 
filtered  iin ages  having  the  same  values  at  regularly  spaced  intervals. 

Stereo  Implementation  and  Stereo  Hardware 

The  implementation  by  Eric  Grimson  of  Marr  and  Poggio’s  stereo 
algorithm  [Marr  &  Poggio  1978]  has  been  tested  successfully  on  a  wider 
range  of  natural  images  and  this  work  has  led  to  further  refinements  of 
the  implementation  [Grimson  &  Marr  1979,  Grimson  1980,  see  also 
Grimson’s  paper  in  these  Proceedings],  Much  of  his  effort  is  presently 
directed  toward  the  problem  of  interpolation  between  the  contours  of 
known  depth  provided  by  the  stereo  program.  The  basic  components 
of  the  computer  implementation  arc  straightforward  and  can  oe 
implemented  efficiently  in  hardware. 

Noble  Larson  has  completed  the  design  of  hardware  x> 


compute  V:G  convolutions  at  near  video  rates  using  the  Hughes  CCD 
convolver  chip  fNadd  ct.  al.  1979].  The  hardware  construction  is 
underway  and  should  be  completed  soon  after  the  chip  becomes 
available  from  Hughes.  I  .arson  and  N»shihara  arc  also  working  on  the 
hardware  design  for  the  stereo  matcher  which  will  work  off  of  the 
outputs  of  two  convolvers,  one  for  each  of  the  stereo  images,  and  it 
should  also  operate  at  near  video  rates.  The  stereo  hardware  has  two 
components,  matching  and  statistics  checking.  Hie  matching  step 
involves  determining  the  number  of  possihlc  matches  (zero-crossings 
with  the  same  sign)  in  a  neighborhood  in  the  left  image  that  is  twice  as 
w  ide  as  the  positive  part  of  V*G.  TTie  neighborhood  is  centered  on  the 
position  corresponding  to  a  zero-crossing  in  the  right  image  (see 
Grimson’s  paper  for  further  details).  If  there  is  only  one  possible  match 
it  is  accepted  as  a  candidate  match  and  its  position  relative  to  the 
position  of  the  zero  crossing  in  the  other  image  is  passed  on  to  the 
statistics  module.  ITic  statistics  module  receives  this  information  along 
with  bits  indicating  whether  there  was  a  zero-crossing  to  be  matched 
and  whether  or  not  at  least  one  possible  match  was  found  as  a  raster 
scan  from  the  matching  module  Fhc  Marr  and  Poggio  algorithm 
allows  candidate  matches  to  be  accepted  only  if  the  ratio  of  matches 
found  to  number  of  zero-crossings  requiring  matches  in  a 
neighborhood  about  the  candidate  match  is  greater  than  what  would  be 
expected  if  unrelated  images  were  being  compared  This  computation 
will  be  accomplished  hy  buffering  the  last  20  o.  so  lines  of  data  from 
the  matcher  and  computing  a  running  ratio  for  a  20  by  20 
neigh borhood  at  the  current  raster  position. 

Texture  Gradients 

The  information  content  of  ’’texture  gradients”  has  been  re-exammed 
by  Stevens  [1980].  Texture  gradients  are  systematic  variations  in  the 
density,  size,  and  other  measures  of  projected  surface  texture.  It  is 
generally  expected  that  these  texture  variations  encode  information 
about  the  shape  of  the  surface  either  in  the  form  of  surface  orientation, 
or  distance,  or  perhaps  both.  Various  mathematical  relations  have  been 
proposed  between  quantities  in  the  image  texture  such  as  density  and 
3-D  quantities  such  as  distance  or  slant  However  most  of  these 
relations  are  not  useful  since  they  embody  assumptions  (in  the  form  of 
geometric  restrictions,  e.g.,  for  global  planarity)  which  are  seldom 
satisfied  in  natural  scenes.  The  problems  of  computing  surface 
orientation  and  distance  were  examined  in  turn.  Each  computation  is 
assumed  to  have  local  support  in  the  image.  A  principle  result  is  that 
while  both  computations  share  several  common  limiting  factors,  the 
distance  computation  is  often  more  robust  and  accurate  than  the 
surface  orientation  computation. 

The  perspective  projection  may  be  usefully  thought  of  as 
comprising  two  independent  transformations  to  any  patch  of  surface 
texture:  scaling  and  foreshortening.  Scaling  is  due  to  distance, 
foreshortening  is  due  to  surface  orientation.  A  decomposition  of  the 
problems  of  computing  distance  and  surface  orientation  is  therefore 


17 


suggested:  when  computing  distance,  the  texture  measure  (the  specific 
numeric  quantity  extracted  from  each  locality  of  the  image  texture) 
should  vary  only  w  ith  scaling:  when  computing  surface  orientation,  the 
measure  should  vary  only  with  foreshortening. 

One  consequence  of  this  is  that  texture  density  is  not  a  useful 
measure  for  computing  distance  or  surface  orientation,  sine?  it  vanes 
with  both  scaling  and  foreshortening.  l*his  explains  the  observation 
made  by  some  psychologists  that  a  pure  density  gradient  (c.g.f  of  dots) 
is  ineffective  in  suggesting  a  definite  3-0  surface.  If  density  is  not  a 
useful  measure  for  computing  cither  distance  or  surface  orientation  in 
general,  w  hat  texture  measures  should  we  choose7 

hirst  consider  live  distance  computation.  Distant  features  on  a 
surface  project  to  a  smaller  size  than  those  that  arc  closer,  provided  the 
features  are  physically  die  same  size.  Ihcreforc  a  smooth  surface  of 
uniform  texture  presents  a  continuously  varying  scale  from  which 
distance  up  to  a  multiplicative  constant  might  be  recovered.  What 
remains  to  be  made  precise  is  the  notion  of  "size"  or  ’’scale”  in  terms  of 
real  images.  Ihc  appropriate  measure  for  the  distance  computation, 
keeping  in  mind  the  measure  should  vary  only  with  distance  and  not 
foreshortening,  arc  termed  characteristic  dimensions  and  correspond  to 
nonforeshortened  dimensions  on  the  surface.  Distance  up  to  a  scale 
factor  may  be  computed  from  the  reciprocals  of  the  characteristic 
(’  mansions,  assuming  that  the  corresponding  physical  dimensions  on 
the  surface  arc  uniform  Since  we  assume  that  no  a  priori  knowledge  of 
the  physical  makeup  of  the  surface  is  available  at  the  point  in  visual 
processing  at  which  the  depth  map  is  computed,  the  computational 
problem  centers  on  choosing  the  characteristic  dimensions,  for  a  natural 
image  presents  a  wealth  of  potential  useful  dimensions  in  any  locality  of 
die  image.  Fortunately,  characteristic  dimensions  may  be  defined  in 
the  image  by  the  following  geometric  properties:  they  arc  locally 
parallel,  oriented  perpendicular  to  die  texture  gradient,  and  arc  parallel 
to  the  orientation  of  greatest  texture  regularity.  Analysis  of  the  local 
image  texture  can  then  identify  the  characteristic  dimensions,  and  their 
reciprocals  specify  the  depth  map. 

The  precision  and  accuracy  of  the  depth  map  is  limited  by  the 
uniformity  across  the  surface  of  the  physical  dimensions  that 
correspond  to  characteristic  dimensions.  Fvidencc  of  uniformity  is 
present  in  the  textured  image;  diis  evidence  would  be  useful  in 
restricting  die  distance  computation  to  those  instances  where  the  depth 
map  would  be  likely  correct.  The  visual  evidence  for  uniformity  of  the 
actual  surface  texture  is  both  local  and  global.  Locally  the  texture  must 
project  as  regular  (c.g.,  the  characteristic  dimensions  must  have  small 
variance  locally)  and  globally  the  texture  must  be  qualitatively  similar. 
Examples  of  similarity  measures  might  be  color  and  intensity  statistics, 
coarse  shape  description  and  other  measures  that  are  roughly  invariant 
over  perspective  projection.  'Hie  local  regularity  and  global  qualitative 
similarity  together  allow  one  to  deduce  global  uniformity,  for 
constraints  on  the  physical  texture  diat  arc  so  strong  as  to  restrict  the 
surface  markings  to  a  small  range  of  sizes  in  any  locality  arc  often 
independent  of  die  position  of  the  markings  on  the  surface.  (For 


example,  oak  leaves  strewn  across  a  yard  arc  qualitatively  similar  and 
have  similar  sizes.  Hie  global  uniformity  in  leaf  si^e  is  a  consequence 
of  how  leaves  grow  and  is  independent  of  how  die)  are  distributed 
across  the  ground.) 

Curfacc  orientation  is  also  believed  to  be  computable  from  the 
texture  gradient.  There  are  actually  two  padis  that  might  lead  to  a 
representation  of  local  surface  orientation  [M»rr  1977)  where  the 
primitives  specify  the  slam  and  tilt  (Stevens  19S0]  of  cash  visible  patch 
of  surface.  The  first  path  is  to  first  compute  a  depth  map.  c.g.  by  the 
method  just  described,  then  to  compute  the  slam  and  tilt  from  the 
gradient  of  distance.  This  indirect  path  is  succcsful  only  when  there  is 
significant  scale  variation  in  the  image  and  fails  for  surfaces  in 
orthographic  projection.  (If  the  surface  is  relatively  distant  -  the 
variation  in  distance  to  the  surface  is  insignificant  relative  to  the  mean 
d;stancc  -  then  the  projection  is  effectively  orthographic,  or  parallel 
projection.  Surface  cannot  be  computed  from  die  depth  map  in  those 
eases  because  the  depth  map  would  falsely  indicate  a  flat  surface  in  the 
frontal  plane.)  The  other  path  is  to  attempt  to  compute  surface 
orientation  directly  from  the  image.  Accordingly,  the  texture  measures 
used  in  doing  so  should  vary  with  foreshortening  but  not  w.ry  with 
scaling.  However  such  measures  arc  difficult  to  interpret  unless  the 
particular  foreshortening  function  is  known  which  relates  the  measure 
to  surface  slant.  Furthermore,  successive  occlusion  associated  with 
viewing  texture  which  lies  in  relief  relative  to  Hie  mean  surface  level 
acts  to  confound  the  apparent  foreshortening.  Slant  is  therefore 
difficult  to  compute.  However  the  tilt  may  be  computed  as  the 
orientation  of  the  characteristic  dimensions. 

Atmospheric  Modeling 

Turning  now  to  Horn’s  work  recall  that  in  the  previous  proceedings,  we 
listed  the  following  uses  for  synthetic  images:  automated  generation  of 
shaded  relief  maps,  generation  of  low-level,  obliquely-viewed  images, 
generation  of  special  maps  that  bring  out  particular  terrain  features, 
classification  of  ground  cover  for  crop  prediction,  matching  images  to 
terrain  data  for  satellite  navigation,  and  making  maps  for  automatic  or 
semiautomatic  change  detection. 

In  general,  four  factors  must  be  considered  when  making 
synthetic  images  for  these  purposes.  They  arc:  (i)  imaging  geometry  - 
the  projection  of  the  viewed  scene  onto  the  image,  (ii)  incident 
illumination  •  the  intensities  and  distribution  of  light  sources,  (iii) 
surface  photometry  -  the  way  a  surface  reflects  light,  and  (iv)  surface 
topography  -  the  shape  of  things  in  the  scene. 

Synthetic  images  that  arc  to  mimic  real  ones  obtained  from 
spacecraft  require  attention  to  a  fifth  factor:  the  atmosphere  attenuates 
visual  signals,  scatters  sourious  light  into  the  viewing  port  of  the 
satellite,  and  illuminates  the  ground  as  a  large,  diffuse  light  source. 
Sjoberg’s  paper  in  these  Proceedings  describes  work  on  modeling  such 
effects. 


18 


Optical  Flow 

We  arc  beginning  research  ,o  develop  an  algorithm  tha,  ,i„  dclcrnine 
c  motion  »f object,  m  a  ,cnc  fro.  a  sconce  of  views  ofthc  scene 
f  n  Cl0SCl>  S8‘lccd  ««*■  In  particular.  ,c  wii,  determine  the  /low 
•>f  motion  ofthc  image  intensities  that  comprise  the  successive  views  of 

rCMrlC,ln8  °ur  aucmil,n  U»o  ituation  where  the  viewer  is 
stationary  and  objects  in  the  scene  are  possibly  in  motion  More 

precise, y  speaking,  we  are  concerned  with  the  situation  where  a  scene 
-li  moving  components  is  project*  onto  a  stationary  image  plane  and 
e  Panging  images  are  quantised  in  both  space  and  time.  Ihe 
P  o  lem  is  that  given  samples  of  the  projection  of  the  scene  onto  the 
•  gc  ,  ane  we  would  like  to  construct  an  estimate  t.f  the  velocity  flow 
of  objects  across  the  image  plane  and  we  would  like  the  density  ofthc 
construction  that  is.  the  number  of  points  a,  which  the  flow  is 
determined,  to  he  at  leas,  as  great  as  the  give,  density  in  the  successive 
m;ge  fmmes  lltc  resulting  velocity  field,  derived  from  the  successive 
m  c  y  Us,118  the  tnformation  ,n  the  image  intensities  to  maximum 
Kcnc  8  •  Prm'dCS  “  Cl,mplC,C  pic,ulc  ntotton  of  objects  in  the 

uses  the  ;r  ^  S‘rCSSCd  ^  "  ",Sh  ''CVCl,,P  *  »*"«•»  «•». 

"formation  m  the  hnage  intensities  t,  the  fullest  advantage; 

‘  llng  'rreorporute  higher  level  knowledge  about  U.e 

constituents  of  the  scene  into  our  algorithm.  We  will  only  use  the 
M»ns.t  aims  that  can  be  derived  from  the  physics  of  image  irradianee  and 
c  relationships  between  the  image  h-.dianccs  from  successive  views 
scene  11, e  starting  point  of  this  work  will  be  an  equation  of 
constraint  derived  by  noting  tha,  the  change  in  image  intensity!  for  si 
sp  accents  of  objects  in  the  scene  will  be  rero.  Specifically  le, 

'y  )  dCn0‘C  **  ,mafc  •«  Poim  (x.y)  in  the  image  frame 

to  cn  at  time  ,.  Then  the  requirement  tha,  the  total  change  in  unage 
intensity  be  zero  translates  to  8 


dE 

dt 


=  0, 


which  by  expanding  according  to  the  chain  rule  becomes 

^  =  ^^4 .&Edy  dE 

dt  Ox  dt  '  dy  ~dt  dt  ==  0 

Dtnmmg  ft,  ,  i„,J  ,  components  of  no  vdocily  flda  b  d 

r - "■  iin“  *  *-1 

vanab.es  u  nd  v 

—  04-^  .  * 

* U  +  ^V+dt=°- 

which  is  valid  at  each  point  in  the  successive  image  frames  under  the 
a  sumption  tha,  the  displacement  of  moving  objects  in  successive 
™mes  is  sufficiently  small  that  changes  in  image  irradianee  due  to 
vanauon  m  onentation  of  the  surface  elements  relative  to  the  viewer 


and  sources  of  .Hum, nation  may  be  neglected.  Given  tha,  the  effect  of 

r  7“  "  nC8"8lb,c-  'alidtty  of  the  equation  does  no, 
spend  on  live  reflectance  properties  of  (he  objects  m  the  *ene  as  long 
as  thcohjccis  arc  opaque,  8 

equation  ml  ral  dCriW,IVCS  "  ^  “^m^efiicien, 

q nation  may  he  computed  a,  each  point  in  cach  SUCCCSS|VC 
frame  independent  of  the  particular  values  of  „  and  v.  So  a,  each  point 
"  an  unage  frame  we  wi„  have  a  constrain,  on  the  possthie  li  I 

iTr  t  mm,i,n- m co“  -v  he  viewed!::;;: 

(  '  *  SP*1CC  i,l0ne  Which  thc  actual  motion  may  lie  Ihe  line  of 

. 

. . lines  J2 

-- . ::z 

ntisige  intensities  to  determine  the  velocity  field. 

!n  practice,  the  coefficients  ofthc  equation  will  be  dtlficul,  to 
-npute  accurately  by  numeric,  methods.  Any  scheme  for  combining 

?!  SlT  "  a  Stra,8hlfor-rd  manner  will  softer  from  insuffic J 
uracy.  fven  „  the  eoefileients  could  be  accurately  computed,  i,  is 

2  10  mOTCl>  'merSCCI  "*  «*-"■  ***  from  objects 

ZZ 7  r0,a,10na,  mOMOn  "  tranS'a,i0n;"  11,0110,1  along  a 

line  that  ,s  pnmanly  parallel  to  the  direction  of  view,  since  adjae  nt 

points  in  the  image  would  not  have  -qual  velocities. 

Nevertheless,  the  problem  of  computing  the  velocity  flow 
requires  computing  a  consistent  so, u, ion  to  a  large  set  of  li  Z 
cqiutttons  tha,  overde, canine  the  solution.  TTte  computational 
Pt  blems  are  not  insurmountable.  We  intend  to  continue  investigating 
schemes  for  combining  the  motton  eon, rain, s  inherent  a,  each  point  in 
successive  unage  frames  .mo  a  comp, etc  and  accurate  estimate  of  th! 
motion  of  objects  ,n  the  scene. 

Bibliography 

Fo,an  aiMMctacusto  OfMlT.ort  Image  UnOcraandmg  „ 

l  "  C.Cr:t  D-  Mm  *"<i  T  Pogglo.  "An  Infomailon  Prnccish, 

ApproaeF  ,o  lining  *c  VlM  Co„„  ..  T„  ™ 

F  C",M  C"“-  K  O.  »  Wo,ta. 

1“  °l  Human  Siereo 

Camhe.0.  u  ,  'C"i  M,s“eh“»“  Inslimic  cl  Technology, 
Cambridge,  Massachusetts.  1980  (in  preparation). 


19 


W  h.  1 ,  Grimson  and  I).  Marr.  "A  Computer  Implementation  of  a 
Theory  of  Human  Stereo  Vision,”  Proceedings  of  ARP  A  Image 
Undemanding  Workshop,  41*45,  April  1979. 

Berthold  K.  I5.  Horn,  ”TTie  Position  of  the  Sun.”  Working  Paper  16? 
The  Artificial  Intelligence  Laboratory,  Massachusetts  Institute  of 
Technology,  Cambridge.  Massachusetts,  1978, 

Ilcrthold  K  P.  Horn  and  llrctt  1  llachman,  "Using  Synthetic  Images  to 
Register  Real  Images  with  Surface  Models,”  AIM-437,  The  Artificial 
Intelligence  Laboratory,  Massachusetts  Institute  of  Technology, 
Camhridge,  Massachusetts,  1977. 

Berthold  K  P.  Horn  and  Robert  W  Sjoberg,  "Calculating  the 
Reflectance  Map”  AIM-495,  ITic  Artificial  Intelligence  laboratory,* 
Massachusetts  Institute  of  Icchnology ,  Cambridge,  Massachusetts, 
1978.  Also  in  Applied  Optics,  June,  1979. 

Berthold  K.  P.  Horn  and  Robert  J  Woodham,  "LANDSAT  MSS 
Coordinate  Transformations,”  AIM-465,  The  Artificial  Intelligence 
laboratory.  Massachusetts  Institute  of  Technology,  Cambridge. 
Massachusetts,  1978. 

Berthold  K.  P.  Horn  and  Robert  J  Woodham,  "Dcstriping  Satellite 
Images,"  AIM-467,  The  Artificial  Intelligence  1  aboratory, 
Massachusetts  Institute  of  Technology,  Cambridge,  Massachusetts 
1978. 

Berthold  K.  P  Horn,  Robert  J.  Woodham  and  William  M  Silver, 
"Determining  Shape  and  Reflectance  using  Multiple  Images,” 
AIM-490,  The  Artificial  Intelligence  Laboratory,  Massachusetts 
Institute  of  Technology,  Cambridge.  Massachusetts,  1978. 

B.F.  Logan,  "Information  in  the  Zero-crossings  of  Bandpass  Signals," 
Bell  System  Technical  Journal  56.  487-510,  1977. 

D.  Marr,  "Representing  Visual  Information,"  AIM-415,  The  Artificial 
Intelligence  Laboratory,  Massachusetts  Institute  of  Technology, 
Cambridge,  Massachusetts,  1577. 

D.  Marr  ana  n.  Hildreth,  "Theory  of  Hdgc  Detection."  AIM-518,  The 
Artificial  Intelligence  L^aboimur-y  M^o^^n^etts  Institute  of 
Technology,  Cambridge,  Massachusetts,  1979. 

D.  Marr  and  Poggio,  T.  "A  Computational  Theory  of  Human  Stereo 
Vision,"  AIM  451,  The  Ar'ificial  Intelligence  laboratory, 
Massachusetts  Institute  of  Technology,  Camhridge,  Massachusetts, 
1978. 

D  Marr,  T.  Poggio  and  E  Hildreth,  "Evidence  for  a  Fifth,  Smaller 
Channel  in  Early  Human  Vision,"  AIM-541,  1  „  Artificial  Intelligence 
Laboratory,  Massachusetts  Institute  of  Technology,  Cambridge, 
Massachusetts,  1979. 


D.  Marr.  T  Poggio  and  S.  UHman.  "Bandpass  Channels. 
Zero-crossings,  and  Earh  \  isual  Information  Processing.”  AIM-491, 
The  Artificial  Intelligence  Laboratory.  Massachusetts  Insututc  of 
Technology,  Cambridge,  Massachusetts,  1978. 

G.  R  Nudd,  S.  D.  Fouse.  T.  A.  Nussmcicr,  and  P  V  Nygaard, 
"Development  of  Custom-Designed  Integrated  Circuits  for  Image 
lindcrstanding.”  Proceedings:  Image  Understanding  Workshop \  19 
November  1979. 

Kent  A.  Stevens,  "Surface  Perception  from  Local  Analysis  of  Texture 
and  Contour.”  TR-512.  The  Artificial  Intelligence  laboratory, 
Massachusetts  Insututc  of  Technology,  Cambridge,  Massachusetts, 
1980. 

G.  Wcsthcimer.  "Diffraction  ITieory  and  \isual  Hypcracuitv."  Am  J. 
Optometry  and  Physiol.  Optics  53,  362*364,  1976. 

11.  R.  Wilson  and  i.  R.  Bergen,  "A  Four  Mechanism  Model  for 
Threshold  Spatial  Vision,"  I  ision  Res.  19,  19-32,  1979. 

P.  IT  Winston,  "MIT  Progress  in  Understanding  linages."  Proceedings: 
Image  Understanding  Workshop.  25-35,  April  1979. 


20 


PROGRESS  AT  THE 

ROCHESTER  IMAGE  UNDERSTANDING  PROJECT 


«J  •  A  .  Feldman 
K.  R.  Sloan,  Jr. 


The  University  of  Rochester 
Rochester,  New  York  1462"7 


1 •  Model  Refinement 

1.1.  Constraint  Networks  and  Procedural 
Description 

One  important  goal  of  the  Rochester 
Vision  Project  is  to  investigate  a 
generalized  representation  of  complex 
objects  by  semantic  networks.  In  our 
formulation  these  include  procedural 
invocation  in  which  an  executive  procedure 
chooses  worker  procedures  to  perform  a  job 
not  just  on  the  basis  of  input/output 

behavior  (as  traditional  pattern-  directed 
invocation  does),  but  also  taking  into 
account  cost/benefit  estimates  and  perhaps 
other  information  as  well.  This  scheme  is 
motivated  by  the  desire  to  have  the 

advantages  of  declarative  knowledge  about 
what  is  doable  (the  descriptions)  along 
with  the  advantages  of  procedural 
knowledge  about  how  to  do  it  (the 

workers).  The  declarative,  descriptive 
component  will  allow  conviences  such  as 
the  modular  addition  of  procedural 
knowledge.  The  main  research  issue  is  to 
decide  what  exactly  needs  to  be  known 
about  worker  procedures,  and  how  to 
express  that  in  a  useful  and  uniform 
manner.  This  must  also  be  coordinated 
with  the  use  of  relational  constraints 

[Russell  and  Brown,  1978].  A  recent  paper 
at  Rochester  exploring  aspects  of  these 
issues  is  [Lantz  et  al . ,  1978]. 

1.2.  Decision  Theory 


The  use  of  decision  theory  not  onlj 
as  an  abstract  model  of  intelligent 
perception  but  as  a  practical  tool  tc 
maximize  computational  benefit/cost  i.<= 
Voing  investigated  in  the  °* 

procedural  in:-  -*  This  work 
con  +  <*  in  the  tradition  of  Bolles 

GarVey'  and  ultimately  we 
dpo?  t0  some  of  their  results  to 

MnLi.  lth  formal  problems  that  more 
closely  approximate  the  sorts  of  vision 
problems  encountered  in  our  particular 
Cf  Ballard  (see  Section  9) 

uses  decision  theory  techniques  to  choose 
the  most  economical  method  (assuring 
adequate  accuracy)  of  locating  anatomical 
structures  in  large-format  images. 


—  Applications  in  Pi omedioine 

Work  has  been  underway  at  Rochester 
or  several  years  on  developing  techniques 
for  reliably  detecting  specific  visual 
features,  even  in  the  presence  of 
considerable  noise.  Our  work  has  been 
based,  on  generalizations  of  '/the  Hough 
technique,  which  accumulates  evidence  for 
straight  lines  at  various  slope  and 
intercept  values  using  an  accumulator 
array.  ^or  some  time,  we  have  been 

successfully  employing  extended  Hough 
techniques  to  locate  second-order  curves 
like  elliptical  sections  and  circles.  In 
the  last  six  months  we  have  been  able  to 
extend  these  techniques  to  handle  a  broad 
class  of  features  ^Ballard,  107on],  "here 
is.  reason  to  believe  that  ‘♦‘hese 
noi so- resistant  feature  i dent if icati on 
methods  can  be  combined  with  our 

constraint  graph  techniques  (cf.  Section 
1  )  to  yield  a  robust  and  general  analyzer 
for  industrial  site  images. 


Appl icati on  in  Aerial  Image  Analysis 

The  three-level  organization  of  image 
analysis  (strategist,  executive,  worker) 
and  a  further  exploration  of  useful 
procedural  description  mechanisms  were 
first  applied  to  photointerpretation  work 
in  [ Lantz  et  al.,  1978].  ^he  object  is  to 
use  the  sorts  of  knowledge-  booed 

inferencing  used  by  skilled 
photointerpreter* .  with  models 

inspj  p-*9  Photointerpretation  keys  for 

xaent lfying  small  industries,  to  do 
reliable  and  flexible  identification  of  a 

installations.  ^  S**U  ,nl"tri’1 

v  .,A  second  phase  of  experimentation  was 
t  *he  analysis  of  selected 

■  ^  3lteS  ?rSlng  l0c^ly  acquired 

11  imagery.  We  have  now  acquired  and 

£*g*“zed.a  3ample  image  from  the  Defense 
apping  Agency  and  are  working  on  the 

?LrUcJrren?fn?Ur  8enerati°"  system. 

The  current  plan  is  to  rely  heavily  on  the 


-i 


general  techniques  described  in  Sections  1 
and  2  above. 

4 .  Image  Encoding  and  Transmission 

4.1  Hierarchical  Image  Encodings 

Communication  of  images,  and 
information  about  images  is  an  important 
part  of  any  image  understanding  project. 
We  have  been  investigating  the  use  of 
various  hierarchical  image  encodings.  One 
of  the  image  transmission  schemes  we  have 
investigated  is  closely  related  to 
"pyramid"  data  structures.  We  have 
demonstrated  that  high  resolution  raster 
images  can  be  effectively  transmitted  over 
relatively  low-bandwidth  lines  by  sending 
a  series  of  low  resolution  approximations, 
which  converge  to  the  final  image  [Sloan 
and  Tanimoto,  1978]. 

A  second  hierarchical  encoding  is 
described  elsewhere  [Ballard,  1979aJ*  A 
Strip  Tree  is  an  elegant  encoding  for 
curves  which  represents  both  open  curves 
(linear  features)  and  closed  curves 
(areas)  in  a  uniform  manner.  Strip  Trees 
are  closed  under  intersection  and  union 
and  operations  on  them  can  be  carried  out 
at  different  resolutions.  These 

properties  make  them  a  nearly  ideal 
representation  for  map  data  bases. 

4.2  Composition  and  Re-interpretation  of 
Images 


software.  A  VAX  11/780  (purchased  with 
non-DoD  funds)  is  operating  and  has  been 
integrated  into  the  local  network.  A  n«w, 
larger  capacity  Eclipse  has  been  added  to 
the  gateway  configuration,  giving  greater 
capacity  and  reliabil  ty.  We  are 

expecting  several  additinal  personal 
compters  and  a  laser  printer  later  this 
year . 

5.2.  Software 

Advanced  system  software  support  is 

now  used  routinely,  and  more  is  under 
development.  Communications  protocols  and 
distributed  computing  packages  rFeldman 
1978,  Sheininger  and  Rabbah  1977, 
Selfridge  1Q79,  Sloan  have  been 

developed  to  allow  access  to  the  GMR-P6 
through  the  local  ALTO  computers  or  the 
remote  PDP-10,  to  achieve  reliable 
transmission  between  distributed 
processes,  to  produce  graphics  and 
halftone  images  on  AL^O  screens  from  the 
PDP-10,  and  to  allow  file  transfer  and 
telnet  to  the  Arpanet.  At  Rochester,  the 
RIG  messae  is  the  lingua  franca  that 
allows  processes  on  remote  machines  to 
command  the  GvR-°6,  perform  file 
manipulations,  and  other  operations.  Some 
of  our  work  has  been  utilized  by  other 

image  understanding  groups,  most 

extensively  at  SRI.  We  have  been  working 
closely  with  other  IU  contractors 
(particularly  CMU)  to  develop  a  uniform 
communication  facility  for  use  in  the 
testbed . 


It  is  often  convenient  to  specify  an 
image  in  terms  of  the  combination  of 
several  existing  images,  rather  than 
transmit  an  entire  new  ,  image.  The 
combination  or  re-interpretation  may 
sometimes  be  performed  with  relatively 
simple  hardware  devices.  We  have 
developed  and  implemented  several  such 
techniques  based  on  the  "video  lookup 
table"  supplied  with  our  Grinnell  GMR-26 
display  [Sloan  and  Brown,  1979].  These 
techniques  are  currently  being  us^d  to 
overlay  map  features  on  aerial  images, 
display  three-dimensional  surfaces  under 
quickly  varying  lighting  conditions,  and 
show  short,  repetitive  motion  sequences. 

5 .  Component  Building 

5.1.  Hardware 

The  Grinnell  GMR-26  display  device  is 
DMA-interfaced  to  an  Eclipse  computer,! 
and  has  been  invaluable  as  an  output 
device  for  our  experiments.  An  Optronics 
Colorscan  C-4100  drum  scanner  is  on  site 
and  interfaces  to  the  Vision  Eclipse. 

Both  Eclipse  computers  are  fully 
configured  and  have  been  running 
effectively  with  our  distributed  system 


A  comprehensive  library  of  vision 
routines  [31oan  iQr77-'7^l  has  been 
developed,  centralized,  documented,  and 
incorporated  into  the  NEXUS  system.  They 
allow  interactive  users  a  wide  range  of 
image-processing  ar.d  display  ^ graphics, 
halftone,  color  and  B%W  TV)  capabilities. 
A  program  to  acquire  images  from  the 
Optronics  scanner  and  package  them 
according  to  our  Raster  Image  File  Format 
f Selfridge  and  Sloan,  1°7q]  has  been 
developed  and  is  in  routine  use. 

6 .  Motion  Understanding 

Understanding  motion  pictures  has 
always  presented  an  unusually  difficult 
problem  to  computer  vision  efforts.  mhe 
compelling  gestalt  induced  in  humans  by 
moving  objects  is  not  well  understood,  and 
so  there  is  little  leverage  on  the 
immediate  problems  resulting  from  the 
large  mass  of  data  in  multi-  frame  images. 
We  began  on  a  pared-down  version  of  the 
problem  which  nevertheless  offers  an 
interesting  set  of  perceptual  phenomena  to 
model.  The  domain  is  multi-  frame  images 
of  animal  motion;  initial  research  is 
being  carried  out  on  sequential  images  o* 
points  of  light  attached  to  joints.  A 


22 


detailed  progress  report  was  presented  at 
the  last  IU  Workshop. 

Z ’  Parallel  Algori ^hm  Development 

This  is  a  new  development  in  our 
laboratory  which  resulted  from  a 
combination  of  several  separate  prior 
efforts.  The  main  idea  is  that  there  is  a 
systematic  duality  between  our  generalized 
Hcugh  techniques  (cf.  2.)  and  the  highly 
parallel  models  of  computation  we  are 
developing  for  det^ribing  animal  vision 
[Feldman,  1979].  This  has  given  rise  to 
some  preliminary  ideas  for  highly  parallel 
implementations  of  image  understanding 
algorithms  which  seem  promising.  These 
ideas  are  being  pursued  in  cooperation 
with  the  VLSI  and  theory  of  computation 
groups  at  Rochester. 

8.  Texture 


Textural  areas  can  be  thought  of  as 
those  parts  of  an  image  where  segmentation 
based  on  normal  similarity  measures  fails. 
Meaningful  analysis  of  textured  areas  must 
include  discrimination  between  different 
textures  and  detection  of  parts  of  the 
same  texture.  The  similarity  of  textures 
which  are  identical  except  for  a  scale 
change,  a  rotation,  or  a  different  range 
of  intensities  must  be  recognized. 

We  approach  the  texture  problem  by 
dividing  texture  regions  into  meaningful 
sub-elements  of  similar  intensity  sample 
points,  then  using  rotation-  and 
scale-invariant  shape  measures  to 
characterize  these  regions  and  finally 
determining  spatial  relationships  among 
our  sub-elements .  By  using  a  decision 
tree  program  structure,  easily 
discriminated  textures  are  separated 
quick**  y ,  and  more  complex  textural 

structure  is  extracted  only  when  necessary 
[Maleson ,  Brown,  and  Feldman,  1977].  A 
major  report  on  this  work  will  be 

available  this  summer. 

9*  Applications  in  Biomedicine 

The  model-directed  finding  of  ribs  in 
chest  radiographs  [Ballard,  1978]  provides 
an  illustration  of  the  use  of  the 
Rochester  Vision  System,  incorporating 
procedure  description,  utility  measures, 
and  top-down,  model-directed  perception. 
The  object  here  is  to  cope  with  large 
amounts  of  possibly  low-quality  data 
without  undue  processing  time  by  depending 
on  a  declarative  model  of  anatomical 
structures,  described  procedural  knowledge 
about  how  to  locate  them,  and  an  executive 
which  uses  decision  theory  to  control  the 
image-understanding  process.  A  prototype 
complete  analysis  sytem  is  now  being 
developed . 


A  novel  and  uniform  method  o** 
describing  arbitrary  functions  on  the  unit 
sphere  (which  define  "museum- vi ewabl e” 
volumes)  is  under  investigation,  with 
immediate  application  to  anatomical 
structures  [Schudy  and  Mallard,  IQrol, 
The  idea  is  related  to  the  well-known 
Fourier  descriptions  of  +  wo-d  im*ns i onal 
shape.  Volumes  are  modelled  and  described 
as  the  leading  coeff j cients  in  certain 
spherical  harmonic  expansions  of  the 
volume  functions.  ^his  method  pIso  allows 
least  squared  error  fitting  of  volumes  in 
coefficient  space,  which  interfaces  nicely 
with  routines  that  locate  the 

three-dimensional  boundaries  of  volumes  in 
imige  data. 

Applications  of  generalized  cylinders 
Agin,  1972]  previously  have  been  limited 
to  simple  cross  sections.  We  use 
B-splines  as  an  embedding  for  generalized 
cylinders  rShanl,  loyq].  "his  allows  an 
efficient  realization  of  the  original 
notion  of  generalized  cylinders  as 
arbitrary  cross  sections  about  a  space 
curve. 


REFFRENCFS 


Agin 


A.P.,  Representation  and 
of  curved  objects, 
Project  Memo  AIM-i^ 
thesis),  October,  1072. 


descri pti on 
Stanford  AT 
(and  Ph.D. 


Ballard,  D.H.,  Model-d i Footed  detection  of 
ribs  in  chest  radiographs,  TR1 1 , 
Computer  Science  Department, 
University  of  Rochester,  March  IQ^P. 

Ballard,  D.H.,  Generalizing  the  Hough 
Transform  to  Detect  Arbitrary  Shapes, 
TR^^,  Computer  Science  Department, 
University  of  Rochester,  October, 

1 Q79  (a);  submitted  to  Pattern 

Recognition.  - 


Ballard,  D.H.,  Strip  Trees:  A 

Hierarchical  Representation  for  Map 
Features,  Proc.  IEEE 
Pattern  Recognition 
processing.  ChTcago , 

August,  1Q79  (b). 

Barrow,  H.G.,  et  al.,  Interactive  aids  for 
cartography  and  photo  interpretation, 
Semiannual  Technical  Report, 
Artificial  Intelligence  Center,  SRI 
International,  November  1977. 


Conf .  on 

and  Image 
Illinois , 


Feldman,  J.A.,  A  distributed  information 
processing  model  of  visual  memory, 
TR52,  Computer  Science  Department, 
University  of  Rochester,  December. 
1979. 


m 


23 


P 


Feldman,  J.A.,  Systems  support  for 
advanced  ima^e  understanding*  DARPA 
Semiannual  Technical  Report,  Mav 
1  978. 

Lantz,  K.A.,  Brown,  C.M.  and  Ballard, 
D-H.,  Model-driven  vision  using 
procedure  description:  Motivation 

and  application  to 

photointerpretation  and  medical 
diagnosis,  22nd  International 
Symposium.  of  the  Society  of 
Photo-optica  Instrumentation 
^q^lleers’  ^an  Diego,  Ca.,  August, 

Maleson ,  J.T.,  Brown,  C.M.  and  Feldman, 
J*A.,  Understanding  natural  texture, 
DARPA  Image  Understanding  Workshop. 
October,  1977. 

Russell,  D.M.,  Where  do  I  look  now?: 
Modeling  and  inferring  object 
locations  by  constraints,  Proc. 
j011^  *  on  Pattern  Reoogni+ion  and 
Image  Processing,  Chicago;  Illinois? 
August,  Ib^q. 

RusseH ,  D.M.  and  C.M.  Brown, 
Representing  and  using  locational 
constraints  in  aerial  .imagery,  Image 
Understanding  Workshop,  November, 

1  978 . 


ocheininger ,  U.,  and  Sabbah,  D. ,  mhe 

display  process,  Inte-nal  Memo, 
Computer  Science  Department 
University  of  Rochester,  December 

Schudy,  R.G.  and  D.H.  Ballard,  Towards 
an  anatomical  model  of  heart  motion 
as  seen  in  4-D  cardiac  ultrasound 
^ata,  IEEE  Coir?.  on  Computer-Aided 
*22lHi|702i  isaasi, 

Selfridge,  P.G.,  A  flexible  data  structure 
for  accessory  image  information, 
JU'-45,  Computer  Science  Department, 
Jnlversity  of  Rochester,  May,  197g. 

Selfridge,  P.  and  Sloan,  Jr.,  K.R. 
"Raster  Image  File  Format  (RIFF):  An 
Aproach  to  Problems  in  Image 
Management , M  »r  52,  Computer  Science 
Departmeni ,  University  of  Rochester, 
May,  1979. 

Sloan,  Jr.,  K.R.,  Rochester  vision  library 
documentation,  Internal  Memos 
Computer  Science  Department, 

University  of  Rochester,  1977  -  1978. 

Sloan,  Jr.,  K.R.,  and  Brown,  C.M.,  "Color 
Map  Techniques,"  Computer  Graphics 
and,  Image  Processing.  lO.  4.  August , 


1  O'-'o. 

.■'loan,  Jr.,  W.R.,  and  n'nni"'oto,  o.j,. 
Progressive  Refinement  o^  Das+er 
Images,"  "R-1  ',  Computer  ccipnco 
Department.  ’adversity  0f  Rochester. 
Uovembe".  1  on  >, 

"■hani,  U.  ,  ’’-splines  as  an  embedding  eor 
generalised  cylinders,  Computer 
cienc°  Department,  "niversi+y  0* 
Rochester,  forthcoming. 


24 


SPATIAL  UNDERSTANDING 


Thomas  O  Binford 


Artificial  Intelligence  Laboratory.  Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 


Abstract 

We  have  introduced  a  lepresentation  mechanism  in 
ACRONYM  for  specific  and  generic  objects  and  partially 
specified  scenes.  The  mechanism  relies  on  specialization  by 
constraints  on  a  class  of  variables  called  quantifiers.  The 
specialization  mechanism  is  used  in  implementing  constraints 
in  interpretation  associated  with  alternate  candidate  model 
matches.  The  rule  language  of  ACRONYM  has  been 
entirely  revised  to  simplify  rule  sets. 

A  powerfi  I  system  has  been  developed  for  determining 
parameters  of  object  models  in  interpretations  of  image 
descriptions.  This  enables  subsequent  information  gathering 
and  detailed  testing  of  small  object  structures. 


INTRODUCTION 

We  describe  extensions  to  ACRONYM,  the  model-based 
interpretation  system.  ACRONYM  includes  a  geometric 
modeling  subsystem,  a  geometric  reasoning  subsystem,  and 
subsystems  for  description,  prediction,  and  interpretation.  The 
geometnc  modeling  system  supports  a  high  level  language  for 
object  models  as  structures  of  generalized  cylinders  A 
rule-bajed  geometric  reasoring  subsystem  is  used  by  the 
other  subsystems.  We  use  the  word  description  to  mean  a 
structure  instantiated  from  representation  elements,  a 
structure  of  relations  and  primitives  at  all  levels.  By 
description  process  we  mean  building  up  structural 
descriptions  from  image  level  to  the  level  of  volume  or 
surface.  By  prediction,  we  mean  synthesizing  structural 
descriptions  at  the  level  of  ohservables  (edges,  ribbons, 
surfaces  in  stereo)  or  closely  related  level.  Prediction  and 
description  are  closely  complementary  and  interact  closely  in 
that  they  share  the  same  knowledge  base  We  report  work  In 
ail  these  parts  of  ACRONYM,  but  particularly  In  geometric 
reasoning  and  interpretation  with  partially  constrained  ob  ject 
models  and  scenes,  and  in  determining  model  parameters  in 
Interpretation, 

ACRONYM 

ACRONYM  Interprets  in  the  domain  of  volumes;  we  are 
working  on  two  approaches  to  go  from  Images  to  volumes. 
We  have  developed  the  predictive  approach  furthest,  to 
make  a  first  tentative  identification  between  image  features 


and  observables  of  objects,  then  to  make  detailed  verification 
of  those  interpretations.  Lowe  has  built  an  imp  -tant  element 
of  the  detailed  verification  stage,  a  program  which  solves  for 
model  parameters  given  an  identifies.  >  of  image 
descriptions  with  observables  of  a  model  ([6T  »iven  accurate 
model  parameters,  gathering  of  additional  iformation  and 
detailed  search  for  small  features  can  b.  carried  out.  The 
system  provides  a  natural  way  of  using  partial  knowledge  as 
constraints  on  model  parameters,  eg.  the  fact  that  an  aircraft 
is  on  the  ground  implies  its  support  points  are  in  a 
horizontal  plane  The  system  determines  model  parameters 
and  a  coordinate  transform  which  relates  the  object  to  the 
observer  frame  it  includes  articulated  models  and  constraint 
relations  on  parameters.  It  sets  up  a  search  system 
automatically  with  appropriate  parameters.  The  solution  is 
expressed  in  terms  of  image  features  which  are  lines  and 
points;  lines  are  a  natural  output  from  image  descriptions. 
Newton-Raphson  search  converges  in  less  than  6  iterations 
typically,  with  each  iteration  executing  in  about  20  msec  on  a 
DEC  KL10  in  compiled  MACLISP.  We  are  extending  the 
system  to  evaluate  which  parameters  can  be  determined,  to 
set  up  constraint  conditions  on  those  parameters  which  can 
be  determined,  to  set  up  initial  estimates  for  search 
parameters,  and  to  evaluate  which  measurements  should  be 
made  to  determine  parameters  which  have  not  been 
sufficiently  constrained. 

geometric  reasoning 

Previously,  the  rule  language  was  quite  limited.  The  data 
base  for  rules  consisted  of  assertion  triples.  Because  most 
geometric  information  was  in  Object  and  Observability 
graphs,  most  manipulations  took  place  outside  the  rule 
mechanism  in  the  form  of  side  effects  which  were  not 
transparent  to  the  rule  based  reasoning.  Brooks  has  made  a 
completely  new,  much  generalized  rule  language  which  works 
with  Object  and  Observability  graphs  directly.  Rule  sets  are 
much  simpler  and  clearer,  much  less  is  hidden  and  the 
system  has  much  more  power.  The  rule  base  for  aircraft 
recognition  has  been  redone  in  {he  new  rule  language, 

Brr.  *  has  introduced  a  new  mechan^m  for  dealing  with 
part  ./  specified  objects  and  scenes  ([2]).  ACRONYM 
incorporates  restrictUn  nodes  which  represent  the  following 
with  one  mechanism.  1.  instances  or  subclasses  of  models;  2. 
multiple  aspects  of  a  singL  object;  and  3.  multiple  candidate 
ribbons  matching  a  single  object  and  multiple  object 
Interpretations  for  single  image  ribbons.  They  all  are 


i 


25 


information  u  Spec,alizat,ons  °f  Partially  spec.f.ed 

vaiueTiat/sf  represented  by  quantify  identifiers  whose 

representation  V%£%!***  ^  ^ 


d'esmnntlonPsreterhPKaSe  USeS  quan:,f,ers  matching  image 

te„u  n,I  .  f  m0delS  Whe"  ‘he  mterpretCT 

consml  °f  ,mige  feaIures  such  35  ribbons  it 

obJ^modT'  At"  thensqUan,ifi<rS  Cf  CSnd,da'e  iRStances  0f 
At  the  same  time,  it  checks  consitf^nrv  nf 

making  global  fvc*  ^  pr°Vides  a  Powerful  mechanism  for 
The  interpreter  nS‘Stent  lnterPretations  of  local  constraints 
LZZ'ir:  ,n?  ChanSed  t0  3  rule-based  control 

,  ’  r  er  t0  satisfy  these  new  requirements  Figure  1 

shows  one  aspect  of  ACRONYM  structure  g 


P,a"ner  haS  new  caPab"“ies  for 
man/!?.  f  quas,-|nvariani  observables  using  symbolic 

.bA  r;  zz!*zz  Tro,"‘ "  p,r,u">-‘PKif'*d 

quasi-invariants  which  it  ran  deduce  from  sDecial 
information  about  the  particular  case,  including  dynam 
information  which  arises  in  the  course  of  matching*  * 


FIGURE  i 
ACRONYM 


geometric 

MOOELING 

l 

geometric 

I - reasoning  _ 

PREOICT  INTERPRET  OESCRIBE 


OBJECT 

l 

VOLUME 

c  * 

SURFACE 

l 

ribbon 

i 

EOGE 

,  1 

IMAGE 


OBJECT 

VOLUME 

SURFACE 

R.BBON 

EDGE 

T 

IMAGE 


been  SC  Edlt0r'  called  MODITOR  (model  editor)  has 
textual  1*7°  ACRONYM  MODITOR  operates  at  the 
tener  1  '  traversine  symbolic  models  A  system  for 

Work  i?  unn  °  PairS  f°r  ViCWin«  has  been  added 

alorhhmt  k  ^  ‘°Ward  *  hidden  su'fa“  display 
pfedicto?  anH  P.U  f°I  hUman  visual,Iabon  »nd  for  the 
r,  r  A  pri0r“y  &raph  scbeme  enables  a 
J ,  of  display  computation  to  be  done  once  at  the  time 
of  modeling,  instead  of  each  time  a  display  is  generated 


Maclisp  convertine  our  to 

ACRONYM  Th  ’  t0  *nte£rate  a"  capabilities  in 
At  RONYM.  The  conversion  is  intended  also  to  facilitate 


portability  to  the  VAX  system,  as  well  as  moving  toward 
compatibility  with  a  segment  of  the  Image  Understanding 

mX?m«  .  adV0Cate  ARPA  dcvelopment  of  a 

ACLISP.LISPM  portable  system  to  complement  ADA  We 
are  planning  to  adopt  graphics  and  image  handling  protocols 
compatible  with  some  subset  of  IU  groups  SRI's  proposals 
IQpamJ  look  like  good  choices.  We  have  traniported  Nevatia 
asid  Babus  edge  finding  and  curve  linking  system  to  the 
WAITS  time-sharing  system  [Nevatial 


STEREO  VISION 


A"  °  d  andr  Blnford  are  workin?  out  detailed  geometric 
constraints  for  stereo  correspondence.  Any  correspondence 
etween  two  images  represents  a  particular  evaluation 
function  and  search  procedure  among  all  the  possible 
correspondences  CDC  minimized  mismatch  along  ep, polar 
hnes  on  a  line  to  line  basis,  using  dynamic  programming 
ha  heuristically  chosen  evaluation  function  We  are 
designing  an  evaluation  function  from  f  st  principles,  while 
considering  solution  procedures  which  include  interline 
constraints  i  he  dynamic  programming  procedure  requires 
significant  improvements  First,  it  does  not  have  solutions 
with  overhang  because  two  sequences  of  edges  must  have 
monotonic  correspondence  Second,  it  should  include  interline 
constraints;  we  have  generalized  the  procedure  to  include 
nterhne  constraints,  however  it  requires  prohibitive 
computation  cost.  H 


It  is  well  known  that  two  views  of  n  indistinguishable  points 
ave  n-squared  ambiguous  correspondences.  Along  a  single 
epipolar  line,  after  all  local  distinctions  between  edres  are 
made  the  n-squared  ambiguity  remains  among  all 
indistinguishable  subsets.  If  two  views  are  interpreted  by 
mapping  onto  a  single  underlying  surface,  then  stretching 
and  cutting  in  the  image  correspond,  to  tilting,  bending,  aid 
folding  of  the  surface.  With  n  indistinguishable  edges  the'e 
appear  to  be  of  order  n*N  surface  interpretations  (where  N  is 
f  nu™ber  occlusions  at  depth  discontinuities),  given 
reasonable  interpretation  assumptions  That  is  a  significant 
reduction,  small  enough  to  be  enumerable  However,  there 
are  powerful  constraints  which  reduce  tha'  number  An 
important  general  constraint  is  occlusion.  Arnold  has  derived 
an  image  condition  necessary  for  occlusion,  we  intend  to  find 
a  related  sufficient  condition.  We  have  found  additional 
strong  local  constraints  on  surface  interpretations.  These  will 
oe  described  in  a  forthcoming  paper. 


One  of  the  strongest  constraints  is  continuity  of  edges  from 
t°,'!ne-  Arnold  used  continuity  previously  [Arnold 
7J  with  considerable  success  without  these  additional 
geometric  constraints.  Liebes  is  developing  constraints  for 
cultural  scenes,  for  horizontal  and  vertical  surfaces 
particularly  planes  and  cylindrical  surfaces.  Vertical  surfaces 
give  special  problems  in  establ'shing  correspondence 
however  there  are  strong  special  case  constraints  for  these 
surfaces.  Those  constraints  are  useful  also  in  establishing 

eu,tur*'  ar<“ for  puslYt  navigation  u,ine 


Baker  describes  an  edge-based  stereo  mapping  system  which 
searches  for  consistent  edge  correspondences  line-by-line 


* 


— - - - - 


26 


along  epipular  lines  ([!]).  At  then  rejects  pairings  which 
violate  interline  continuity  Implementation  of  the  system  was 
motivated  by  a  desire  for  speed,  it  executes  in  about  30 
seconds  for  256x  256  images.  Edges  are  obtained  by  zero 
crossings  of  1x7  and  7x1  bar  masks.  They  are  paired  on  the 
basis  of  matching  contrast  or  intensity  on  one  side  of  the 
edge  For  each  edge  there  are  a  set  of  >ossible  matches. 
Formerly,  a  branch  and  bound  search  proces*  was  used  to 
establish  correspondence.  That  has  be-n  replaced  by  a 
iterbl  algorithm  dynamic  programming  method.  The 
Viterbi  algorithm  is  much  faster  It  returns  only  a  single  best 
solution  however,  which  may  be  error  sensitive.  That  is,  the 
iccally  optimal  solution  for  single  lines  may  not  be  globally 
consistent.  A  later  stage  removes  some  of  these  errors  The 
program  uses  a  coarse  to  fine  search  procedure  like  the 
binary  search  correlator  Introduced  by  ([83),  however  limited 
to  single  epipolar  lines.  Branch  and  bound  search  and 
dynamic  programming  both  sought  solutions  which 
maximize  the  number  of  edge*  which  matched,  among 
so  utions  with  equal  number:  of  edge  matches,  they  sought 
solutions  which  minimize  the  sum  of  squared  errors  of 
intensities  of  intervals  to  either  side.  Pairings  for  individual 
epipolar  lines  are  tested  for  consistency  by  testing  dtpth 
continuity  The  consistency  procedure  follows  connected  edges 
in  both  images,  calculating  local  mean  and  standard 
deviation  of  disparity,  and  removing  pairings  by  a  sequence 
of  tests  on  disparity 


Clarkson  and  Binford  are  evaluating  feasibility  of  improving 
the  solution  program  for  the  stereo  camera  transform  ([3])  to 
make  it  more  stable  and  robust,  i.e  less  sensitive  to  errors  of 
mismatched  points,  to  improve  speed  of  solution,  and  to 
make  a  single  solution  program  for  both  degenerate  and 
non-degenerate  cas«-s  The  solution  must  determine  a  rotation 
of  one  camera  coordinate  system  relative  to  the  other  (3 
parameters)  and  a  translation  unit  vector  of  one  camera 
center  relative  to  the  other  (2  parameters).  Rays  through  the 
two  lens  centers  to  a  single  space  point  are  coplanar.  This 
provides  one  constraint  per  point.  Translation  parameters  are 
poorly  determined  in  the  degenerate  case  with  little  relief 
compared  to  camera  distance,  We  are  investigating  solutions 
which  separate  out  subspace  solutions.  Clarkson  has  found  a 
condition  using  pairs  of  correspondences  to  determine  the 
rotation  Independent  of  the  translation.  He  is  evaluating 
whether  that  constraint  system  has  a  computationally 
effective  solution.  We  are  also  considering  using 
corresponding  lines,  which  are  natural  for  edge  operators, 
and  considering  alternative  parameterizabons  which  may 
lead  to  simplified  solutions. 


EDGE-BASED  DESCRIPTION 


is  Hat  near  its  maximum).  Solution  interpolate  edges  from 
zero-crossings  of  the  laterally  inhibited  signal.  1  Texture  and 
surface  narks  provide  spurious  edges.  Solution:  Use 
irectiona  derivatives  and  use  a  non-linear  measure 
gaussian  residuals)  in  curve  linking  We  are  working  toward 
Improved  localization  in  position  and  angle,  curve  linking 

W  IC  is  isotropic  in  angle,  and  improved  treatment  of 
spurious  marks 

Marr  and  coworkers  have  given  a  valuable  alternative 
motivation  for  the  use  of  zero  crossings,  based  on  analogy 
with  Logans  theorem  on  characterizing  one-dimensional 
waveforms  from  zero  crossings  ([7]).  They  provide  powerful 
insight  into  the  role  of  zero  crossings  In  natural  stereo  vision 


PLANS 

Research  will  continue  to  extend  the  interpretation 
capabilities  of  ACRONYM.  The  module  for  determination 
of  model  parameters  will  be  integrated  into  ACRONYM  and 
it  will  be  extended  to  deal  with  partial  information.  The 
representation  mechanism  for  quantifiers  and  specialisation 
by  constraints  will  be  completed.  Both  will  be  used  in 
experiments  In  interpretation  v.’ith  aircraft  and  vehicles. 

0n  extendin«  descriptive  parts  of 
ACRONYM  beyond  edges  and  regions  to  surfaces  and 
volumes,  using  generalized  cylinders.  In  addition,  we  will 
ntegrate  stereo  with  ACRONYM  and  implement  a  part  of 
the  large  descriptive  system  of  ([9]). 


References 

H  H  -  "Edge-Based  Stereo  Correspondence",  Proc 
I9ft0  Undersland,nS  Workshop,  Univ  of  Md,  May. 


*  Brooks,  R  ,  "Representing  and  Reasoning  about  Partially 

f,PeC  f‘edA  ,*""«•  Pr<*  Image  Understanding 

Workshop,  Unlv  of  Md,  May  1980.  5 


??n.ner^'  D"  "Stereo  Camera  Transform  Solution"  Proc. 
AKt  A  Image  Understanding  Workshop  USC,  Nov.  1979. 


[4]  Henderson,  R,  "Autcmatic 
Man-Made  Targets";  SPIE  Proc.  H 


Stereo  Reconstruction  of 
'jntsville,  Aug  1979. 


ioj  Morn,  B.K  P , 


.  The  Binford-Horn  Edge  Finder",  MIT  AI 
Memo  285,  revised  December  1973. 


We  are  developing  an  Improved  edge  finding  and  curve 
linking  program  based  on  extensions  of  the  a.oproach  of  the 
Inford-Horn  system  ([5]).  At  that  time  we  considered  the 
following  problems  and  solutions  for  them:  I.  Smooth 
shading  gives  continuous  areas  of  false  edges  with  gradient 
operators.  Solution,  "lateral  Inhibition",  difference  from  the 
local  average.  2.  Low  contrast  edges  are  difficult  to  find. 
Solution:  directional  derivatives  of  laterally  inhibited  signal; 
sequential  detection  edge-linking  algorithm.  3.  Cradient 
operators  do  not  provide  accurate  edge  estimates  (a  function 


16J  Lowe,  D ,  "Solving  for  the  Parameters  of  Object  Models 
from  Image  Descriptions";  Proc.  ARP  A  Image  Understanding 
Workshop,  Univ  of  Md,  May.  1980. 

[7]  DMarr,  E.Hildreth;  "Theory  of  Edge  Detection”;  AI 
Memo  518,  AI  Lab  MIT,  April  1979.  D  Marr,  T.  Poggto,  "A 
Theory  of  Human  Stereo  Vision,"  AI  Memo  451,  MIT,  Nov 
1977.  W.E.L.  Crimson  and  D.  Marr,  "A  Computer 
Irnplernemation  of  a  Theory  of  Human  Stereo  Vision"; 

1979  UnderslantUni  Workshop,  Palo  Alto,  Apr.l 


27 


t8]  Moravec,  H,  P.  "Towards  Automatic  Visual  Obstacle 
Avoidance",  Proc  lJCAt-5,  MIT,  Boston,  Aug  1977. 

[9]  Nevatia,  R.,  Structured  Descriptions  of  Complex  Curved 
Objects  for  Recognition  and  Visual  Memory",  A  I  Lab 
Stanford  University,  Memo  AIM-250,  CS-464,  ADA003486, 
October  1974 

[10]  Nevatia,  R.,and  Babu,  K.R.,  "Linear  Feature  Extraction 

and  Description",  Proc  of  IJCAI-79 ,  Tokyo,  Aug  1979 

639-641.  & 


THE  SRI  IMAGE  UNDERSTANDING  PROGRAM 


M. 


A-  Fischler  (Principal  In ves t tga tor) 
SRI  International 
Menlo  Park,  California  94025 


INTRODUCTION 

Research  at  SRI  International  under  the  ARPA 
mage  Understanding  Program  was  initiated  to 
Investigate  ways  in  wnich  diverse  sources  of 
knowledge  might  be  brought  to  bear  on  the  problem 

initia1ly\in8  anr  lnterPretin8  aerial  images.  The 
initial  plase  of  research  was  exploratory  and 

entifiec  various  means  for  exploiting  knowledge 

an  ^OCe*Sing  aerial  photographs  for  such  military 
pplications  as  cartography,  intelligence,  weapon 
guidance,  and  targeting.  A  key  concept  Is  the  use 
of  a  generalized  digital  map  to  guide  the  process 
Of  image  analysis.  The  results  of  this  earlier 
work  were  integrated  into  an  interactive  computer 
system  called  "Hawkeye"  This  system  provides 

necess  baslc  facUltles  for  a  tfldey  P 

U  ^ro  ?HCart°faphy  3nd  Ph°t0  fnterPretat  ion,  and 
it  provides  a  framework  within  whi_h  other 

applications  can  be  readily  demonstrated. 

Research  subsequently  focused  on  development 

sleciYulZ'T^  Cf  expert  perf™e  in  a 
P  c  task  domain--r oad  monitoring.  The  primary 

objective  of  this  ongoing  research  is  to  build a 

th3t  'understands"  the  nature  of 

ranfbl  r  rev'“nr:j'  Xt  ls  tended  that  it  be 
capable  of  performing  such  tasks  as. 

(1) 


(2) 


(3) 


Finding  roads  in  aerial  imagery. 

Distinguishing  vehicles  on  roads  from 
shadows  signposts,  road  markings,  etc. 

Comparing  multiple  images  and  symholic 
information  pertaining  to  the  same  road 
segment,  and  deciding  whether  significant 
changes  have  occurred. 

The  general  approach,  and  details  of  technical 

Expert Sare"  the  components  cf  the  Road 

Expert  are  contained  in  References  [2-61.  We  are 

now  integrating  these  separate  components  into  a 
coherent  system  that  facilitates  testing  and 
vacation  and  will  be  in  a  form  suitable  tor 
transfer  to  the  ARPA/DMA  Integrated  Demonstration 
System  ("testbed").  Plans  for  the  Road  Expert 
demonstration  system  are  presented  in  Reference 

two  ner^StiL^The^f1^:;6^^'" 

Joint  ARPA/DMA  program  to  provide  afrZwoVEor 
demonstrating  the  applicability  of  image 
understanding  research  (from  throughout  the  entire 

o  he  ^  t0  military  Pr°blefflS  in  8eneral ,  and 
problems  of  automated  cartography  in 

aPrertH  ^  J**  Pla"S  a"d  Pr°gresa  this  effort 
are  described  later  in  this  paper. 


The  second  effort  is  to  broaden  the  scope  and 
generality  °f  our  image  understanding  research  — 
specifically  in  the  areas  of  3-D  terrain 
understanding,  perceptual  reasoning,  and  image 
description  and  matching.  A  complementary  research 
program  (described  in  Reference  [7]),  jointlv 
supported  by  ARPA  and  NSF ,  augments  those  ' 
investigations  by  focusing  on  fundamental 
computational  principles  underlying  early  staees  of 
visual  processing  in  both  man  and  Lhines! 

THE  ARPA/DMA  INTEGRATED  DEMONSTRATION 
SYSTEM  ('‘TESTBED") 


Overview 

ARPA  and  DMA  have  jointly  established  an 
integrateci  demonstration  system  ("testhed")  ,  with 
as  the  integrating  contractor.  The  system 

used  for  H61"8  dt,Veloped  at  SRI>  I®  intended  to  be 
used  for  demonstrating  and  evaluating  the 

applicability  of  IU  research  to  cartography.  For 

this  purpose  it  will  have  a  user  interface  that 

simulates  the  environment  of  a  cartographic  work 

s  a t  on  consisting  of  a  computer  with  CRT  terminal, 

tab  let 86  Jith  th  Wlth  traCk  baU’  and  3  Adzing 
wh  for  f  exception  of  image  digitization, 

bv  DMA  7  PurP°ses  will  be  performed  off-line 
y  MA,  the  system  will  support  all  major  steps  in 

automat  ion.  W^th  3  COnti"UOusly  solving  degree  of 

Initially  the  system  will  allow  interactive 
creation  and  editing  of  digital  maps  in  a  fashion 
similar  to  Hawkeye  [1].  Existing  maps,  for 
examp  e,  can  be  overlaid  on  new  imagery,  edited 
and  extended  using  a  variety  of  infective  aids 
-  racing  linear  features  and  modeling  objects. 

ti:  ARPANET^of  alr  allOW  rem°te  “on  (over 

the  ARPANET)  of  automated  and  semi -automated 

techniques  developed  by  IU  contractors.  Those 

techniques  whose  utility  and  reliability  justify 

the  systembe8rati°n  WlU  thGn  be  lncorP°rated  into 

fa  ..™e  SyStem  V'U1  3lS°  be  usable  as  a  research 

access  to  Pb°Vlding  bhe  IU  and  DMA  communities  with 
access  to  the  most  advanced  tools  available.  This 

nrovra  of  compatibility  standards  for  ’ 

wort  F  7  data’  Wl11  encoura8e  building  upon  the 
rk  of  others,  allowing  more  ambitious  projects  to 

cLU"  ataken-  Furthe™°«,  the  availability  of 
common  data  sets  will  promote  more  systematic 
evaluation  of  competing  techniques. 

The  above  objectives  share  the  requirement  for 

iLT  tle  arChltect“~  a°  that  contributions 
developed  in  a  diverse  community  can  be  fully 


29 


utilized.  Integration  support  must  be  provided  for 
a  wide  range  of  contributions,  from  primitive  image 
processing  techniques  (e.g.,  an  edge  follower)  to 
stand-alone  subsystems  (e.g.,  an  expert  system  for 
terrain  modeling).  Language  and  operating  system 
support  must  be  sufficiently  broad  to  encompass 
programs  running  under  the  multitude  of  systems 
used  throughout  the  community.  These  systems  can 
be  expected  to  hange  continuously  during  the 
testbed's  lifetime. 

To  meet  these  requirements,  the  system  will  be 
configured  as  a  library  of  application  modules  that 
accepts  input  and  deposits  results  in  a  shared 
global  data  base.  For  normal  research  and 
development,  modules  can  be  directly  controlled  in 
an  interactive  environment  via  the  keyboard  and 
graphical  devices.  For  demonstrations  a  front-end 
process  is  interposed,  simulating  the  environment 
of  a  cartographic  work  station  similar  to  Hawkeye. 
This  interface  facilitates  communication  with  the 
system  via  menus  (e.g.,  ZOC)  or  limited  natural 
language  (e-g.,  LIFER,  RITA)  and  includes  a  "help 
f  acility • 

Ea*-n  application  module  performs  a  well- 
defined,  high-  or  low-level  task.  Modules  are 
independently  compiled  so  each  can  be  implemented 
in  different  languages  and  can  reside  on  different 
processors.  Modules  interact  with  each  other  by 
means  of  a  standard  interfacing  mechanism, 
resembling  a  procedure  call.  Details  of  how 
control  is  actually  passed  will,  of  course,  vary, 
depending  upcn  the  level  of  module  integration 
(same  address  space,  same  processor,  or  remote 
p  rocessor ) . 

Most  data  interactions  will  be  affected  by 
accessing  the  shared  data  base.  Modules  operate  on 
common  data  from  the  data  base  and  deposit  their 
results  back  into  the  data  base,  where  they  will  be 
available  for  display  or  subsequent  processing  by 
other  modules. 

The  data  base  is  accessed  via  a  uniform  query 
language,  which  enforces  compatibility  and 
maintains  integrity  without  constraining  a  module's 
Internal  representation.  Modules  need  not  know  the 
source  of  the  data  they  use  nor  who  will  use  their 
results.  For  example,  a  program  that  needs  the 
locations  <  f  edges  in  Image  X  will  look  in  the  data 
base;  if  they  sre  not  there,  the  program  requests 
the  use  of  an  edge-locstor  module  which  will 
deposit  results  tagged  as  "edges  for  image  X"  in 
the  data  base,  where  they  will  remain  available  for 
future  use.  The  data  base  is  thus  the  key  to 
modularity  in  a  large  integrated  system. 

The  VAX  11/780  has  been  selected  as  the  main 
testbed  machine.  All  tightly  integrated  psrts  of 
the  system  will  be  resident  there.  All  other  IU 
machines  will  be  viewed  uniformly  as  remote  ARPANET 
hosts,  including  SRI's  KL-10.  However,  the  SRI  KL 
will  have  a  high-speed  channel  to  the  VAX  via  a 
shared  disk,  so  application  modules  runr.ing  there 
will  incur  minimal  overhead- 

SRI  is  building  a  core  system  on  the  VAX.  It 
will  include  a  dsta  base  and  some  general  system 


utilities,  such  as  display  servers,  data  base 
manager,  work  station  front  end,  and  an  ARPANET 
gateway . 

Development  of  application  modules  will 
proceed  in  parallel  at  all  IU  sites.  Each  site 
will  maintain  local  copies  of  relevant  parts  of  the 
official  data  base  (e.g.,  imagery)  needed  to 
develop  and  demonstrate  their  routines.  Each  site 
will  also  be  able  to  call  remotely  resident  modules 
over  the  network,  using  the  gateway  mechanism. 

SRI  will  use  the  ARPANET  to  exercise 
application  modules  in  the  context  of  the  core 
system.  As  utility  and  performance  justify, 
modules  will  be  imported  to  run  at  SRI  on  our  KL  or 
VAX.  As  the  final  demonstration  takes  form, 
critical  modules  may  be  recoded  to  integrate 
efficiently  with  the  core  VAX  environment. 

Having  completed  the  system  definition,  our 
time  schedule  is  to  do  the  detailed  design  and 
construction  of  the  core  system  in  1980,  integrate 
application  modules  provided  by  the  II  community  in 
3981,  and  evaluate  and  possibly  extend  the  system 
in  1982. 

Progress 

The  VAX  11/780  computer  sys  tern  has  been 
installed  at  SRI.  The  U.C.  Berkeley  UNIX  and 
DEC/VMS  operating  systems  have  been  obtained  and 
subjected  to  benchmark  testing  to  determine  their 
relative  merits  with  respect  to  our  special  needs. 
Since  UNIX  has  been  agreed  upon  as  the  ARPA 
community  standard,  our  use  of  VMS  will  be 
compatible  with,  and  fully  support,  a  UNIX 
operating  environment.  Processing  environment 
standards  to  facilitate  the  community-wide  effort 
are  required  for  the  success  of  the  testbed  effort 
and  have  received  a  great  deal  of  attention  on  our 
part  in  recent  months.  A  document  describing  our 
detailed  plans  and  proposals  in  this  area  is 
currently  available.  (Requesters  can  obtain  this 
file  over  the  ARPANET  from  the  file  CTESTBEDDOO 
INTEGRATION. DOC0SRI-KL. ) 


RESEARCH  ACCOMPLISHMENTS 

Two  of  our  recent  research  accomplishments  are 
described  in  papers  published  in  these  proceedings. 

A  paper  by  Fischler  and  Bolles  [8]  introduces 
a  new  paradigm,  Random  Sample  Consensus  (RANSAC), 
for  fitting  a  model  to  experimental  data.  RANSAC 
is  capable  of  interpret ing/sraoothing  data 
containing  a  significant  percentage  of  gross 
errors,  and  thus  is  ideally  suited  for  applications 
in  automated  image  analysis  where  interpretation  is 
based  on  the  data  provided  by  error -prone  feature 
detectors.  A  major  portion  of  this  paper  describes 
the  application  of  RANSAC  to  the  Location 
Determination  Problem  (LDP):  given  an  image 
depicting  a  set  of  landmarks  with  known  locations, 
determine  that  point  in  space  from  which  the  image 
was  obtained.  New  results  are  derived  for  the 
minimum  number  of  landmarks  needed  to  obtain  a 


a 


30 


thl" 'tlo"vand  algorithms  are  given  for  computing 
these  minimum-landmark  solutions  in  closedPforra! 
These  results  form  the  basis  for  an  automatic 
system  that  can  solve  the  LDP  under  severe  viewing 
•a  ana  ysis  conditions.  Implementation  details 
and  computational  examples  are  also  presented. 

A  paper  by  Quam  [9]  addresses  problems 
associated  with  the  access  of  elements  of  large 
multidimensional  arrays  when  the  order  of  access  is 
either  unpredictable  or  is  orthogonal  to  the 
conventional  order  of  array  storage.  Large  arrays 
(specifically  arrays  which  are  larger  chan  the  ' 
Physical  memory  available  to  store  them)  must  be 

Sf6  eljher  by  the  virtual  memory  system  of  the 
computer  and  operating  system  or  by  direct  input 
and  output  of  blocks  of  the  array  to  a  file  system. 
In  either  esse,  the  direct  result  of  an 
inappropriate  order  of  reference  to  the  elements  of 

Jtra  s  tbe  very  time-consuming  movement  of 
data  between  levels  in  the  memory  hierarchy,  often 

1  factors  of  three  orders  of  magnitude  in 

algorithm  performance. 

The  access  to  elements  of  large  arrays  is 
e  omposei  into  three  steps:  transformation  of  the 
subscript  values  of  an  n-d imensional  array  into  the 
element  number  in  a  one -dimensional  visual  array; 
upping  of  virtual  array  position  to  physical 
memory  position  and  access  to  the  array  element  in 
physica!  memory.  The  virtual-to-physical  mapping 
step  is  unnecessary  on  computer  systems  with 
sufficiently  large  virtual  address  spaces. 

A  subscript  transformation  is  proposed  that  is 
believed  to  solve  most  of  the  order-of-access 
problems  associated  with  conventional  array 

Saairafe‘  ThlS  transf°rmation  is  a  based  on  an 
additive  decomposition  of  the  calculation  of 
element  number  in  the  array  into  the  sum  of  a  set 
ot  integer  functions  applied  to  the  set  of 
subscripts  as  follows: 


e lement-n umber (i,j  k,  . . . ) 


fi(i)  +  f  j(j) 
+  fk(k)  +  .  .  . 


whirl/mi  ^h°iCes  for  the  transformation  functiona 
which  minimize  access  time  to  the  elements  of  the 

hierLdhPetlf  °w  the  ch8racteristics  of  the  memory 
hierarchy  of  the  computer  system  and  the  order  of 

accesses  to  the  elements  of  the  array.  It  is 
conjectured  that  there  are  easily  obtained  models 
for  system  and  algorithm  access  characteristics 
from  which  a  pragmatically  optimum  choice  can  be 
ade  for  the  subscript  transformation  functions. 

...  f^he  “se  tsbles  to  evaluate  the  functions  fi 
snd  fj  makea  the  implementation  very  efficient 
using  conventional  computers.  When  the  array 
accesses  are  made  in  an  order  inappropriate  to 
> onventional  array  atorage  order,  this  scheme 
requires  far  less  time  than  for  conventional  array 
secessesing  schemes  otherwise  the  accessing  times 

for  :°"P8rable-  The  SemantlCS  uf  8  8“  Procedures 
tor  ar  ay  scceas,  array  creation,  snd  the 

association  of  arrays  with  file  names  is  defined. 

For  computer  systems  with  insufficient  virtual 


memory,  such  as  the  PDP-10,  a  software  virtual  to 
physical  mapping  scheme  is  used.  Implementations 
to  access  pixels  of  large  images  stored  as  two- 

!lvenSu  nno  torayS  °f  “  MtS  P0r  el™  ^r  the 
AX  and  PDP-10  series  computers  are  presented. 


ACKNOWLEDGEMENT 

Pro  rr°"trlbuf°rs  to  the  SRI  Image  Understanding 
Program  include:  G.  J.  Agin,  S.  Rarnard, 

H-  G.  Barrow,  R.  c.  Bolles,  M.  A.  Fischler, 

,  „•  Gfirvty,  G.  A.  Jirak,  D.  L-  Kashtan, 

H  r"  2Ua™’  J‘  M’  Tenenbaum-  A-  P.  Witkin,  and 

n  •  C .  WO  1 f . 


REFERENCES 


1-  H.  G.  Barrow  et  al . ,  "Interactive  Aids  for 
cartography  and  Photo 

^077rur?tatl°n'  Progress  Report,  October 
in  Proceedings :  Image  Understanding 
Workshop,  pp.  111-127  (October  1977): - 

2’  J?‘  A'  Flschler  et  al.,  "Interactive  Aids  for 

Cartography  and  Photo  Interpretation  " 
Semiannual  Technical  Report,  SRI  Project  5300, 
SRI  International,  Menlo  Park,  California 
(October  1978  and  May  1979). 

3*  L.  Qusm,  "Road  Tracking  and  Anomaly 
Detection,"  in  Proceedings:  image 
Hgdfct standing  Workshop,  pp.  51-55  (May  1978). 

A‘  R-  C.  Bolles  et  al.,  "The  SRI  Road  Expert: 
Iraage-t o-Da tabase  Correspondence,"  in 

Igage  Understanding  Workshop.  pp. 
163-174  (November  1978).  '  "  ^  * 

5-  G.  J.  Agin,  "Knowledge-Based  Detection  and 
Glassification  of  Vehicles  and  Other  Oblects 
n  Aerial  Road  Images,"  in  Proceedings:  image 
Upstanding  Workshop,  pp.  66-7l“(A^il 

S.  M-  A.  Fischler,  J.  M.  Tenenbaum,  snd 

H-  C.  Wolf,  "Detection  of  Roads  snd  Linear 
Structures  in  Aerial  Imagery,"  in  Proceedings: 
iES£e  Understanding  Workshop  (NoveS^7T979)T 

7-  M.  A.  Fischler,  "The  SRI  Image  Understanding 
Program,  in  Proceedings:  Image  Understanding 
workshop  (November  1979). 

8-  M.  A.  Fischler  snd  R.  C.  Bolles,  "Random 
Sample  Consensus:  A  Paradigm  for  Model 
Fitting  with  Applications  to  Image  Analysis 
and  Automated  Cartography,"  in  Proceedings: 

Understanding  Workshop  (April  1980). 

)•  L.  Qusm,  "A  Storage  Representation  for 

Efficient  Access  to  Large,  Multi-Dimensional 
Arrays,  in  Proceedings:  Image  Understanding 
Workshop  (April  1980).  — ' * 


31 


PROGRESS  IN  IMAGE  UNDERSTANDING  RESEARCH  AT  USC 


Ramakant  Nevatia 
and 

Alexander  A.  Sawchuk 
Image  Processing  Institute 

Electrical  Engineering  and  Computer  Science  Departments 
University  of  Southern  California 
Los  Angeles,  California  90007 


We  have  continued  work  at  various  levels  of 
our  IU  system  and  started  to  evaluate  techniques 
tor  applications  to  DMA  supplied  images.  These 
activities  are  described  in  more  detail  in  our 
semiannual  technical  report  j,],  and  the 
following  contains  only  a  brief  abstraction. 

IMAGE  MATCHING 


Matching  of  an  image  to  a  symbolic  map,  or 
a  symbolic  description  of  another  image,  is 
central  for  the  tasks  of  map  updating  and  change 
detection.  In  the  past  we  have  described 
results  using  aerial  images  uf  areas  such  as  San 
rrancisco,  Stockton,  San  Diego,  etc.  [21 

These  previous  techniques  used  a  simple  matching 
scheme,  with  each  element  in  one  description 
being  matched  to  the  best  corresponding  element 
in  the  other  description,  and  not  allowing  for 
any  revision  based  on  the  matching  of  other 
neighboring  elements.  We  have  now  incorporated 
a  relaxation  matching  algorithm.  This  algorithm 
is  different  from  those  used  by  Rosenfeld  and 
associates  [3],  in  use  of  a  well  defined 
optimization  criterion.  Details  of  this 

algorithm  are  described  in  a  separare  paper  in 
these  proceedings  [4] 

TEXTURE  ANALYSIS 


as  they  validate  the  sufficiency  of  the  models 
used  for  analysis. 

SEGMENTATION 

We  are  trying  to  use  the  texture  analysis 
techniques  to  aid  in  scene  segmentation. 
Texture  features  car  be  used  as  intensity 
features  for  segmentation.  However, 

difficulties  arise  because  texture  features  must 
be  measured  over  a  window  assumed  to  contain  a 
single  texture  only.  Also,  texture  features 
have  many  components  and  should  be  treated  as  a 
vector.  Faugeras  and  Lee  give  some  preliminary 
results  in  [  1  ]  . 

Another  segmentation  project  is  to  develop 
techniques  foi  segmenting  images  that  have 
ummodal  intens.ty  (or  color)  histograms.  This 
happens  typically  when  the  image  consists  mostly 
of  a  large  background  region,  with  small  hut 
significant  other  regions.  Faugeras  and  Bhanu 
have  developed  a  gradient  relaxation  technique 
to  modify  the  histogram  to  bring  out  the  peaks 
corresponding  to  the  small  regions  |U 
Currently,  this  technique  is  applicable  if  only 
two  types  of  regions  are  present  in  the  image. 


We  have  continued  development  of  our 

structural  texture  analysis  techniques.  The 

analysis  uses  micro-euges  detected  in  a  texture 
and  derives  repetition  pattern  characteristics 
of  these  edges.  Our  previous  presentations  have 
described  techniques  for  determining  the  width 
of  texture  primitives  and  a  repetition  period 
if  any.  Our  new  techniques  are  also  able  to 
extract  the  length  of  the  primitives  and  thus 
describe  the  shapes  of  the  primitives.  The 
usefulness  of  these  descriptions  for  recognition 
of  natural  textures  is  currently  being  tested. 


HARDWARE  IMPLEMENTATION 

In  continuing  work  with  Hughes  Research 
Laboratories,  Malibu,  California,  we  are 
investigating  the  use  ~f  VLSI  technology  for 
hardware  implementation  of  IU  algorithms .  We 
have  chosen  to  investigate  the  following 
algorithms  initially: 

i)  Nevatia-Babu  Line  Finder  1 5] 

ii)  0hi.ander  Region  Segmentor  [6] 

iii)  Laws  Texture  Analysis  System  [?] 


TEXTURE  SYNTHESIS 


We  have  several  ongoing  projects  in 
synthesis  of  natural  textures  from  a  stochastic 
model  using  few  parameters.  The  techniques 
include  auto-regressive  modeling  with 
conditional  expectations  and  algebraic 
reconstruction  techniques.  Models  for  texture 
synthesis  are  also  useful  for  texture  analysis 


The  choice  of  the  abc  e  three  algorithms 
was  based  on  their  computation  intensive  nature, 
their  use  for  a  broad  range  of  orobleras  and 
experience  with  a  large  number  of  images  fo  tne 
first  two.  Also  these  algorithms  are  largely 
local  and  hence  easier  to  implement  in  VLSI 
hardware,  where  rev  mg  interconnections  is 
important.  Further,  the  three  algorithms  have 
common  kernels,  such  as  convolution,  but  also 
require  different  subsequent  processing.  A 


study  of  there  should  provide  valuable  feedback 
on  t  e  feasibility  of  hardware  implementation 
tor  a  large  class  of  algorithms. 


32 


At  this  time,  no  decision  on  algorithms  for 
actual  implementation  has  been  made  and  opinions 
.  the  IU  community  are  invited  on  the 
suitability  of  the  proposed  algorithms  as  well 
as  suggestions  for  other  algorithms. 


REFERENCES 


1.  R.  Nevatia  and  A. A.  Sawchuk, 
Technical  Report,"  USCIPI  Report 


"Semiannual 
#960,  March 


2.  R.  Nevatia  and 
Structures  in  Aerial 
ARPA  Image  Understanding 
Ca . ,  October  1977, 


K.  Price,  "Locating 
Images,"  Proceedings  of 
Workshop,  E&lo  Alto, 


3.  A, 
"Scene 
Trans  . 
No.  6, 


Rosenfeld,  R. A .  Hummel  and  S.W.  Zucker, 
labeling  by  Relaxation  Operations/'  IEEE 
on  System^,  Man  &  Cybernetics,  SMC-6 
PP-  420-453,  June  1976. 


4.  0.  Faugeras  and  K.  Price, 

Description  of  Aerial  Images  Using 
Labeling,  m  these  proceedings. 


"Semantic 

Stochastic 


5.  R.  Nevatia 
Extrac  t ion , " 
Understand ing 
Nov.  1978,  pp. 


and  K.R.  Babu,  "Linear  Feature 
Proceedings  ot  the  ARPA  Image 
Workshop,  Pittsb  urgh ,  Pa 

7  7  0  *  * 


6.  R.  Ohlander,  K.  Price  and  R.  Reddy,  "Picture 
M  .rf  f l0"  Using  a  Pecurstve  Region  Splitting 
V  .th°J*  Compute  Graphics  and  Image  Processing! 
V’l.  8,  1978,  pp.  313-333.  - - 


7.  K.  Laws,  "Textured  Image 
LSCIPI  Report  ?/940,  January  1980. 


Segment  at  ion  /' 


33 


t 


TOWARD  THE  RECOGNITION  OF  CULTURAL  FEATURES 


Mohamad  Tavakoli 

Computer  Vision  Laboratory 
Computer  Science  Center,  University  of  Maryland 
College  Park,  MD  20742* 


ABSTRACT 

The  goal  of  this  research  is  to  find  a  method 
tor  recognition  of  cultural  features  such  as  roads 
and  buildings  extracted  from  aerial  photographs. 

The  approach  involves  several  successive  stages  of 
grouping  of  linear  features.  In  this  process,  it 
is  highly  desirable  to  avoid  firm  decisions  at  any 
stage,  but  rather  to  make  fuzzy  or  "probabilistic” 
decisions  whenever  possible,  thus  deferring  commit¬ 
ments  until  they  are  confirmed  by  other  evidence. 
The  decisions  at  each  stage  are  based  on  as  much 
information  as  possible,  and  each  stage  uses  an 
appropriate  type  of  data  representation.  In  this 
report,  the  stages  of  the  feature  extraction  pro¬ 
cess,  as  they  are  presently  conceived,  are  de¬ 
scribed,  and  examples  of  results  obtained  at  the 
first  few  stages  are  given. 


INTRODUCTION 

cultural  features  often  contrast  with  their 
surrounds,  and  are  usually  bounded  by  sharp,  local¬ 
ly  straight  edges.  Thus  in  order  to  find  cultural 
features,  edge  detection  and  line  finding  tech¬ 
niques  can  be  used  as  a  starting  point.  Such 
techniques  have  been  studied  for  many  yeirs;  par¬ 
tial  surveys  may  be  found  in  [1-5] . 

Several  considerations  have  led  us  to  use  an 
edge-based  approach  at  the  pixel  level.  We  first 
use  local  operators  to  estimate  the  magnitude  and 
direction  of  the  gradient  at  each  point.  We  then 
use  an  iterative  process  at  the  pixel  level  to  ad¬ 
just  the  magnitudes  and  directions.  See  [6]  for 
more  details.  The  approach  to  feature  extraction 
at  the  University  of  Southern  California  [4]  is 
also  edge-based,  but  it  involves  a  one-step  process 
of  non-maximum  suppression  and  thresholding,  rather 
than  an  iterative,  quantitative  process.  The  USC 
approach  is  thus  computationally  cheaper,  but  it  is 
probably  more  likely  to  make  errors. 

After  extracting  edges  at  the  pixel  level 
we  want  to  construct  a  more  global  data  representa¬ 
tion.  This  will  allow  us  to  use  more  knowledge 
about  csuiWMl.  features.  In  order  to  obtain  a  more 
global  representation  a  global  straightness  crite¬ 
rion  is  used  in  defining  connected  components  of 
edge  pixels  by  requiring  each  pixel's  direction  to 

‘Permanent  address : Shiraz  University,  College  of 
Engineering,  Shiraz,  Iran. 


be  close  to  the  average  direction  of  the  already 
accepted  pixels  [6].  This  breaks  up  smooth 
curves  into  segments  having  relatively  low  net 
change  in  slope  from  one  end  to  the  other. 


of  which  we  can  associate  various  properties.  At 
this  stage  local  properties  are  used  to  define  c 
figure  of  merit  vector  representing  initial  guesses 
for  object  interpretation.  Using  this  vector,  one 
can  define  the  initial  probability  assignment. 


After  the  above  stage,  we  use  knowledge  about 
the  linear  features  to  find  compatible  and  anti¬ 
parallel  pieces  and  group  the  edge  segments. 
Finally,  we  update  the  probabilities  of  each  I'ne 
segment  based  on  these  groups  of  line  segments  and 
the  physical  descriptions  of  the  objects. 


In  what  follows,  we  present  a  detailed  de¬ 
scription  of  the  design  of  some  of  the  low  level 
grouping  operators. 


INITIAL  PROBABILITY  ASSIGNMENTS 

As  already  mentioned,  a  global  straightness 
criterion  is  used  to  construct  the  edge  segments. 
We  associate  various  properties  with  each  segment, 
including  its  length,  average  strength,  etc.,  as 
well  as  properties  of  the  gray  levels  on  the  two 
sides  of  the  segment's  constituent  edge  pixels.  At 
this  level  there  are  many  edge  segments  which  do 
not  belong  to  objects  such  as  roads  or  buildings. 
They  are  edges  of  other  types  of  objects  or  simply 
noise.  For  simplicity  these  edges  are  called 
other  edges . " 

One  of  the  most  useful  properties  that  can  be 
used  for  calculation  of  the  initial  probability 
assignment  vector  is  the  average  gray  level  in  a 
strip  on  each  side  of  the  segment.  These  averages 
can  then  be  compared  with  typical  gray  levels  of 
cultural  features  such  as  roads  or  buildings.  The 
minimum  difference  of  these  side  average  gray 
levels  from  the  typical  gray  levels  of  roads  and 
buildings  is  used  as  a  figure  of  merit  in  the  cal¬ 
culation  of  initial  probabilities. 

Roads  and  buildings  are  the  brightest  objects 
on  the  photographs  that  we  used.  They  also  have 
similar  gray  levels  (similar  reflectances)  in  the 
scene .  Using  these  facts,  in  what  follows  an  auto¬ 
matic  method  for  estimating  the  gray  level  is 


34 


described 

1)  Calculate  the  average  gray  level  in  a 
strip  on  each  side  of  each  line  segment. 

2)  Sort  the  line  segments  in  decreasing  order 
of  length. 

3)  Select  the  longest  p%  of  the  lines 
(usually  5%) . 

4)  Calculate  the  average  gray  levels  of  the 
brightest  sides  of  the  lines  selected  in 
step  (3). 

The  average  gray  level  calculated  in  this  way 
can  be  accepted  as  a  good  estimate  for  the  typical 
gray  level  of  the  objects. 


(1/dgr-l/gr) (g-gr)  +  1 

when  (g2r-2gr  dgr )  /  (gr-dgr)  s.  g  <>  gr 

dl=  ]°  when  g2r/(gr-dgr)  <g  <  (g2r-2gr  dgr)/ 

I  (gr-dgr) 

'(1/gr-l/dgr) (g-gr)  +j 

when  gr  ^  g  s.  g2r/ (gr-dgr) 

Here  dgr  is  the  deviation  allowed  for  road  gray 
level;  beyond  it,  the  figure  of  merit  of  "other” 
will  become  greater  than  the  figure  of  merit  of 
road.  The  value  of  g  is 


To  define  the  process  of  calculating  the 
figures  of  merit  more  precisely,  each  line  segment 
in  the  scene  has  two  sides.  The  average  gray 
levels  of  the  strips  along  the  two  sides  of  the 
segment  are  denoted  by  gl  and  g2  (see  Figure  1). 

Suppose  that  the  typical  average  gray  levels 
of  roads  and  buildings  are  gr  and  gh  respectively. 
Then  th<  differences 

fl  =  |gr  -  gl |  and  f2  =  | gr  -  g2| 

measure  the  dissimilarity  between  the  two  sides  of 
the  line  segment  and  the  gray  level  of  a  typical 
road.  Therefore,  the  function  sr  =  min(fl,f2)  is 
a  measure  o*  the  gray  level  similarity  between  the 
given  line  segment  and  a  typical  road.  Similarly 
the  differences 

hl  “  |gh  ~  gl |  and  h2  =  | gh  -  g2 1 

measure  the  dissimilarity  between  the  two  sides  of 
the  line  segment  and  the  gray  level  of  a  typical 
building,  and  the  function  sh  =  min(hl,h2)  is  a 
measure  of  the  gray  level  similarity  between  the 
given  line  segment  and  a  typical  building. 

Finally, 

s  *  min(sr,sh) 


will  be  small  if  the  gray  level  average  on  one  of 
the  sides  of  the  line  segment  is  close  to  the  gray 
level  of  a  typical  building  or  road.  Therefore, 
if  s  is  small  the  line  segment  is  more  probable  to 
be  an  edge  of  a  house  or  a  building  than  to  be  an 
"other"  type  of  edge,  whereas  if  s  has  a  large 
value,  the  probability  that  the  line  segment  is  in 
the  "other"  class  is  high. 

In  order  to  express  the  value  of  s  as  a  figure 
of  merit,  linear  functions  are  used.  Let  di  (i  = 
1,2,3)  represent  the  figures  of  merit  To  define 
them  as  linear  functions  of  s,  the  following  linear 
expression  is  used  for  calculation  of  a  road  figure 
of  merit.  This  linear  function  is  shown  in  Figure 
2  by  thin  solid  lines. 


g  -  gl  if  fl  <  f 2 

and 

g  =  g2  if  fl  >  f 2 

Similarly  the  figure  of  merit  for  a  line  segment 
being  a  pie~e  of  a  building  is  shown  by  the  thick 
solid  lines  in  Figure  2  and  its  expression  is  as 
follows : 


{(1/dgh-l/gh)  (g-gh)  +  1 

when  g2h-2gh  dgh) / (gh-dgh)  £  g  £  gh 

0  when  g2h/ (gh-dgh)  <  g  <  (g2h-2gh  dgh)/ 
(gh-dgh) 

(1/ gh-l/dgh) (g-gh)  +  1 

when  gh  sg  s  g2h/ (gh-dgh) 


Here  dgh  is  the  deviation  allowed  for  building  gray 
level;  beyond  it,  the  figure  of  merit  of  "other" 
becomes  greater  than  the  figure  of  merit  of  build¬ 
ings.  The  value  of  g  is 


g  =  gl  if  hi  <  h2 

and 

g  =  gi  if  hi  >  h2 


When  sr  <  sh  road  is  more  probable;  therefore 
we  use  the  dashed  line  for  calculation  of  the 
figure  of  merit  for  "other".  Similarly  when 
sr  >  sh  buildings  are  more  probable  ani  the  dotted 
line^is  used  for  calculation  of  the  figure  of  merit 
for  "other".  In  summary,  the  figure  of  merit  for 
the  "other"  class  is  calculated  using  the  following 
formula; 


When  sr  <  sh 


/gr  wher  0  <  g  <  2gr 
when  g  ^  2gr 

Similarly  when  sr  >  sh 

|gh  -  g I /gh  when  0  <  g  <  ?gh 

d3  =  (, 

1  when  g  £  2gh 


d3  =  { 


|gr  -  8 1 


The  initial  probability  for  each  label  is  ob¬ 
tained  by  dividing  the  figure  of  merit  of  each 
label  by  the  sum  of  the  figures  of  merit  of  the 
three  labels.  Defining  the  initial  probability 
in  thi3  manner,  we  have 

(0)  3 

P*  (i)  =  di/  Z  di  i  -  1,2,3 
A  i-1 


35 


BJs 

as  noise,  segments  can  be  discarded 


f AIofLanJedgeF  AVERAGE  gray  level  on  both  sides 

ass1gimcntf!rwe°harv»ntoaflndhthfrial  Pr°taMllty 

iatbionhosides  °f  a  ““  •  iXsaren-S1 

la t ion  or  average  gray  level  on  both  sides  of  an 
edge  segment  is  as  follows: 

D  Generate  a  strip  of  width  »d"  on  each  side 

2L  «V!8Tt*  Flnd  the  co~ordinates  of 
the  points  inside  the  two  strips  as  well 

as  the  number  of  points  on  each  side. 

2)  Calculate  the  average  gray  level  on  each 
side  by  dividing  the  sum  of  the  gray  levels 
y  the  number  of  points  on  each  side. 

The  algorithm  starts  by  reading  In 

a/s  jss!- -t* •  f  - 

degrees  or  is  between  90  and  180  degrees  This 
fferentiation  is  necessary  in  order  to  define  a 
sense  for  each  side  of  the  line  segment. 

.  .  RefTlng  t0  FlgUre  3>  the  points  are 

side!  /3  6nd  P°lnt  1  3nd  end  P°int  2.  The 
sides  are  denoted  similarly  Usino  the  . 

in  Ficure  3  t-h o  ^  y*  L  sing  the  conventions 

for  eLh  a’  th  fol*owln8  equations  can  be  writter 
for  each  edge  segment  and  for  the  boundaries  of  the 
strips  on  both  sides  of  each  segment  l9s 
not  equal  to  90  degrees  we  have?  18 


y0(x)  =  mx  +  mx  ^ 

~x/m  +  x^/m  ■ 
*x/m  +  x  /m  +  y 


y13(x) 

yi4(x) 

^11 ^  =  mX 


z  m  T  >2 

m(x!  +  Ax)  +  y2  -  Ay 


yl2(x>  ”  m(x1  “  Ax)  +  y  +  Ay 

where  Ax  =  d  sin6,  m  = 

Ay  =  d  cos  6  when  0  s  6  <  90 
and  Ay  =  -d  cos6  when  90  <-  6  <  180 

-™„r  r  s: M- 

«,  “  'h°”  *»  F‘8“re  1  th. 


x0  =  X1  =  x2> 
x12  =  x0  ~  d> 
y  14  *  Y2 


X11  “  xo  +  d 

y13  "  ?i 


The  digitized  image  is  riven  in  r-n~  e  r 

rjTS"^  °; eiements  g(i’j)  S"h  a 

U,jl  are  the  Cartesian  coordinates  of  a  point  and 
*<*.;)  1.  th.  ..In,  „f  th.  brightness  ,t  K£ "o“C 

,t0  tnlcnl.t.  th.  gray  l,.,l  av=„g., 

nd  divide  by  the  number  of  points  in  the  strip: 
Average  gray  level  =  I  g(i,j)/n 

1)  When  0°  £  0  <  90° 

a)  For  side  "1" 

X2  *  1  ‘  Xl+Ax  y2  -  Ay  S  j  s  y 

yn(1>  <J  <y0(D  y14(i)<  j<y13(i) 

b)  For  side  "2" 

X2-Ax  £  i  £  Xx  y2(i)S  j  <  y1+Ay 

y0U)<  j  <  y12(i)  y14(i)<  j  <yi3(i) 

2)  When  90°  <  0  <  180° 

a)  For  side  "1" 

X1  *  1  *  x2+Ax  y2s  J  *  y2+Ay 
yo(1)<  J<  yn(1)  y14(J)<  j<  yn(i) 

b)  For  side  "2"  J 

Xl-Ax  s  is  x2  y2.Ay  tj 

y12(1)<  J<  y0(i)  yi4(D<  j<  yn(i) 

3)  When  0  =  90 

a)  For  side  "1" 


Xj-d  s  i  Xl 
b)  For  side  "2" 
X1  <  1  <  x1-fd 


yi  *  J  ‘  y. 


yi  *  i  s  y2 


FINDING  PAIRS  OF  COMPATIBLE  SEGMENTS 

t: ^  «rr;ry*- 

ngs  used  in  the  program  will  now  be  described, 
a)  Themodel  of  edges  belonging  to  a  piece  of  a 

From  the  function  of  a  road  -ft- 

sr.“  i'uL,  Pt°P",1“  this  «m.l 

1)  The  spectral  properties  of  a  road  corre¬ 
spond  to  materials  such  as  concrete  and 
asphalt  and  it  is  usually  homogeneous. 


36 


2)  A  piece  of  an  edge  of  a  road  should  have 
an  anti-parallel  edge. 

3)  A  piece  of  an  edge  of  a  road  is  usually 
connected  to  other  neighboring  pieties  with 
low  angle  deviation. 

b)  The  model  of  edges  belonging  to  a  building 

Similarly,  the  physical  and  geometrical  pro¬ 
perties  of  a  huilding  are: 


1)  The  spectral  propel  ties  of  the  roof  of  the 
building. 

2)  The  similarity  of  gray  level  inside  the 
edges  constituting  a  building. 

3)  A  piece  of  an  edge  of  a  building  is  con¬ 
nected  to  other  pieces. 

4)  The  edges  of  a  building  form  a  closed 
figure  (usually  with  right  angles). 

In  order  to  use  the  above  models  the  geometric 
relationships  between  each  pair  of  lines  within  a 
neighborhood  in  the  scene  should  be  studied.  In 
general,  using  the  conventions  of  Figure  3,  every 
pair  of  lines  in  the  scene  belongs  to  one  of  six¬ 
teen  cases.  These  cases  are  listed  in  Table  1. 

The  entry  "side"  in  Tatie  1  refers  to  the  object 
side  of  the  given  segment. 

In  order  to  find  the  object  side  of  a  line 
segment,  first  the  two  values  sr  and  sh  are  calcu¬ 
lated.  Then,  using  the  following  decision  rules 
the  object  side  is  found: 


when  sr  <  sh 
if  fl  <  f 2 
else 

when  sr  >  sh 
if  hi  <  h2 
else 


side  =  1 
side  =  2 


side  =  1 
side  =  2 


To  check  the  similarity  condition,  the  average 
gray  level  on  the  object  side  of  the  pair  of  lines 
is  calculated  by 

g  K  (gA  +  gB) / 2 

where  gA  and  gB  are  the  average  gray  levels  of  the 
strips  along  the  ohject.  sides  of  lines  A  and  B. 
Then,  the  corresponding  average  gray  level  of  a 
strip  along  a  line  connecting  the  ends  of  the  lines 
is  calculated.  The  difference  between  this  value 
and  g  is  a  measure  of  the  gray  level  similarity 
of  the  line  connecting  the  two  ends  with  the  pairs 
of  lines.  If  this  difference  is  within  the  limits 
used  in  calculation  of  the  figures  of  merit,  then 
the  similarity  condition  is  satisfied. 

In  a  case  where  the  distance  between  the  ends 
is  very  small,  that  is,  comparable  with  the  width 
of  the  strip  used  in  calculation  of  the  gray  level, 
the  similarity  measure  is  not  reliable.  This  is 
because  the  number  of  points  used  in  calculation 
of  the  average  gray  level  is  limited.  In  ases 
where  the  distance  between  the  ends  of  pair  under 
study  is  less  than  the  width  of  the  strip  used  in 
calculation  of  the  average  gray  level,  the  simila¬ 
rity  condition  will  not  be  checked.  In  this  case 
the  pair  is  considered  as  a  compatible  candidate 
if  the  appropriate  geometrical  conditions  are 
satisfied . 

Geometrical  conditions  are  important  in  making 
two  lines  compatible.  Figure  6  and  Figure  7  show 
examples  of  geometrically  compatible  and  incompati¬ 
ble  pairs,  respectively.  To  differentiate  between 
geometrically  compatible  and  incompatible  pairs, 
certain  constraints  on  the  geometrical  locations 
of  the  end  points  are  necessary.  The  ratio  of  the 
distances  between  end  points  can  be  used  to  reject 
the  geometrically  incompatible  pairs. 

In  what  follows,  the  first  four  cases  in  Table 
1  will  be  analyzed  and  their  compatibility  condi¬ 
tions  derived.  The  other  cases  have  similar  con¬ 
ditions  . 


Assume  that  the  pair  of  lines  under  study  are 
labeled  as  line  A  and  line  B.  The  angles  of  the 
two  lines  with  respect  to  the  x-axis  are  0A  and 
0g  respectively.  Depending  on  the  orientation 
of  the  pair  of  lines,  different  angles  between  the 
two  lines  are  possible.  Figure  5  shows  examples  of 
the  angle  6  between  two  lines.  The  plus  sign  in¬ 
dicates  the  side  of  the  road  or  building.  Accord¬ 
ing  to  this  convention  the  angle  between  two  col- 
linear  lines  is  180°. 

Referring  to  the  model  of  edges  constituting 
the  objects,  each  of  these  pairs  of  lines  should 
satisfy  certain  conditions  in  order  to  be  accepted 
as  a  candidate  compatible  pair.  In  general,  these 
conditions  are: 

a)  Similarity  of  gray  level  of  a  strip  along 
a  line  connecting  their  ends  with  respect 
to  the  object  side  of  the  pairs. 

b)  Conditions  on  the  geometrical  configura¬ 
tion  the  pair  of  lines. 


•  Case  (1) 

Referring  to  Figure  8,  there  are  five  diffe¬ 
rent  configurations.  In  this  case  the  compatibi¬ 
lity  of  line  A  with  respect  to  line  B  at  end  (2) 
or  the  compatibility  of  line  B  with  respect  to 
line  A  at  end  (1)  is  considered.  Tahle  2  summa¬ 
rizes  the  conditions  imposed  in  these  cases.  The 
pa  ameter  m  in  the  table  is  taken  to  be  1.5.  This 
allows  some  overlap  between  the  pairs  of  compatihle 
line  segments.  The  angle  between  the  two  lines  is 


0  —  7T  + 

l0A-eBi 

if 

*  0 

0  7 T  - 

PQ 

CD 

1 

< 

CD 

if 

6B 

* 

Case  (2) 

In  this  case  seven  different  configurations 
are  considered.  These  are  shown  in  Figure  9.  The 
compatibility  of  end  (1)  of  line  A  or  line  B  is 
considered.  Table  2  summarizes  the  required 


37 


conditions.  The  angle  between  the  two  lines  in 
this  case  is 


and 

where 


6-2,-  |eA-eB| 
e  -  |eA-efi| 


when  ya2  <  yO 
when  ya2  >  yO 


,ro 


mb  xa2  -  mb  xbl  +  ybl 


and  mb  is  the  slope  of  line  B.  The  conditions  at 
end  (2)  of  the  lines  are  similar  to  the  end  (1) 
conditions.  To  find  these  conditions  al  and  bl 
should  be  changed  to  a2  and  b2  except  that  in  this 
case 


yQ  *  mb  xal  -  mb  xbl  +  ybl 

The  side  similarity  for  some  configurations  is  dif¬ 
ferent  in  this  case. 

Case  (3) 

When  the  compatibility  of  end  (1)  of  line  A 
with  end  (2)  of  line  B  is  considered,  there  are 
five  different  configurations.  Figure  10  shows 
these  configurations.  The  conditions  are  summarized 
in  Table  4.  The  angle  between  the  lines  is 

3  - 71  +  IW 

The  other  possibility  is  to  study  the  compatibility 
of  end  (2)  of  line  A  with  end  (1)  of  line  B.  Here 
again  there  are  five  different  configurations. 

Figure  11  shows  these  configurations.  The  condi¬ 
tions  are  summarized  in  Table  5.  The  angle  between 
the  lines  is 

6  = 71  -  leA-0Bl  * 


Case  (4) 


The  conditions  for  this  case  are  summarized  in 
fable  6  and  Table  7.  The  different  configurations 
are  shown  in  Figure  12  and  Figure  13.  The  angle 
between  the  lines  is 


e  -  |eA-eB| 


when  the  compatibility  of  end  (1)  of  line  A  is  con¬ 
sidered.  Similarly  the  angle  is 


6 


leA-eBl 


when  the  compatibility  of  end  (2)  of  line  A  is  in 
question. 


So  far  the  geometrical  and  similarity  condi¬ 
tions  for  the  pairs  of  compatible  pieces  have  been 
found.  In  what  follows  the  algorithm  for  finding 
compatible  pairs  will  be  explain'd. 


Algorithm  for  Finding  Compatible  Pairs 


1)  Choose  those  line  segments  whose  "other11 
probability  is  not  equal  to  1. 


Si 


2)  For  end  "1"  of  each  line,  find  the  short¬ 
est  distances  from  other  end  points  of 
line  segments. 

3)  Find  the  object  side  of  the  given  line  and 
the  other  lines  found  in  (2;. 

4)  Check  the  geometrical  and  similarity  con¬ 
ditions  for  the  given  line  and  the  other 
lines  found  in  (2).  Reject  those  lines 
for  which  the  required  conditions  are  not 
satisf ied . 

5)  If  all  the  lines  are  rejected  go  to  (8). 

6)  Find  the  angle  of  the  line  with  respect  to 
the  remaining  lines  in  (4).  Choose  the 

which  has  the  smallest  angle  (e.g. 
greater  than  25°)  with  respect  to  the  line 
under  study. 

7)  Choose  the  other  end  of  the  line  found  in 
(6)  and  go  to  (2) . 

m 

8)  Choose  the  other  end  of  the  given  line 
and  go  to  (2) .  If  the  other  end  has  al¬ 
ready  been  tested  go  to  (9). 

9)  Continue  the  above  process  for  the  other 
.line  segments. 


FINDING  PAIRS  OF  ANTIPARALLEL  EDGES 

The  edges  of  cultural  features  usually  occur 
in  pairs,  as  in  the  sides  of  roads  and  of  buildings. 
To  identify  these  features  the  edges  should  be 
clustered  into  antiparallel  pairs  (i.e.  pairs  of 
facing  edges  that  are  parallel  but  have  opposite 
senses)  Clustering  must  take  into  account  infor¬ 
mation  from  the  picture  in  the  regions  around  the 
edges.  Fcr  example,  a  road  usually  has  a  uniform 
gray  level  and  thus  it  is  reasonable  to  expect  the 
facing  sides  of  an  antiparallel  pair  of  edges  to 
have  similar  gray  levels.  Previous  work  [4,7]  has 
restricted  the  choice  of  pairs  to  lines  that  are 
closest  neighbors.  A  method  of  pairing  anti¬ 
parallel  straight  lines  reported  in  [8]  is  based  on 
the  distance  between  the  lines,  the  amount  by  which 
they  overlap,  and  on  whether  or  not  other  lines 
are  interposed. 


r  unc  {Jcxxta  or  lines 

that  are  anti-parallel  up  to  a  certain  angle  dif¬ 
ference  (usually  25°)  when  similarity  of  gray  level 
between  the  pairs  is  satisfied. 

The  basic  procedure  is  as  follows.  A  strip  is 
moved  along  the  object  side  of  each  edge  segment. 
The  movement  is  continued  until  the  similarity  is 
lost  or  the  distance  moved  is  greater  than  the 
largest  expected  object  size  in  the  scene.  While 
the  strip  moves,  it  hits  other  line  segments. 

Among  these  line  segments  the  following  segments 
ars  rejected: 


a)  If  they  are  not  anti-parallel 


f  -  . 


38 


than 


- w  idice 

a  threshold . 


The  similar! tv  j 

“■  *«  « ,.f  «ij.cTX5«- 


jS~“gmove/  <  iPUn1  _ 

where  g  =  av  °  slmi Parity 

sttip.  the  moving 


i-noa  usee  m, 

for 

H  lrs  ^  explains  finding 

Al„ _ , _  . 


ft  -  ai 

and  gmove 


The  level  of 

taken  as  7  f  ;\lrnilarity  used  in  rh 
Wien  the  stri'n  ^  *S  a  rather  tolen  Pr°8ram 
similarity  iqP  ltS  a  candidate  lindnti.COndlt:lon' 

« •*«  ZtLr  rtr-11’  ^  ,oV’ni « 

this  change  of  th  7  candidate line  7  value 
strip8  the  level  raay  ,topt£'Kote  that 

P  ne  movement  of 

the  smallest ^i g temalnin8  lines  the  on 

T°  fl"d  H.e  shortest ediS  8el*«ed as bas 
parallel  line  c  C  dlstance  better,  "““Parallel. 

■“'*  p-p.s.sr;,*'' 

wi  th  ' the^f ^  ,  lntersee  t  ioVof  the'^p  '°  the  “her^  ' 
line,  the  ai  g  line  is  located"  perPendicular 

lining  distananCe  iS  "Rented  °U‘Slde  of  the 

distance  bettwaeeneS:hthe  is  £2*  r«‘ 

s.1*  2 

distances  °™ 


dl  (1  =  1.2,3,d) 


2)  Generate  the  =  tr,  , 

*'•-  2 £ ss-  - 

”  E?‘h‘;  “  ««  —enr. 

distance  aro  arjd  the  totni  m 

!  •  u ;«  tfo  .peceL  ine 

different!  t  hose  iines  where  m  °  fler“ 

threshold  and'  Saft:cr1than  the  speEif^ 
opposite  to  the  .  cln8  side  is  net 

similarity  level C ri8lnal  line.  Set  rh 
the  Hr,  ^  levei  equal  to  m  bet  the 

1,06  f°Und  aad  continue  feheC°ntrast  «* 

4)  For  th  Process. 

r  the  candidate  line  c 
.  choose  the  one  which,  °U"d  ln  (3) 
ance.  Mark  the  line  T  7  dis- 

-  Process  it  again  ^  ln  °rder  not 


5) 


^  and  d4  are  rejected  and 

d  =  min(dl,d2) 

is  selected  as  the  ja 

he  distance  between  tl 

T°  check  wh  *u  6  U’°  llnes- 

lf  0  is  not  equal  to  Qn  a 
and  xl  *  xint  *  x2  th  f8rees 

the  intersect  nn, 

and  71  *  degrees 

be  tween  "t^'end01"1  ls 

Here  (xint.yint)  P°lnts- 

section  point.  3re  the  coordinates  of  th 

the  inter- 

This  method  of  fv-j, 

h“ tte  of 

*’  5™.“"  ‘8  “■  co®pared  »lth 

h)  When  several  n 

t7h0d  allows  alTorth^01"8  a  line,  the 
th«  —  iin,  „ 


""  Pr°C“S  f»  *"o  ™..f,lng 


rithm^is  "ST  r^"8  distance  Jn  tl 

]/*  of  the  siL  0%  !dj  U  ls  set  to  h6  al8°- 

taining  sman  k4f  the  Picture  p  be  equal  to  be 

in  order  to  LJeCtS  thla  distance  SCe"es  c°a~ 
difference  cj  beY^^1^  time  reduced 

not  sensitive  to  th^  arbltrarily.  Th„  6  angle 
®0Ve  along  the  oh  h  S  thresh°ld  since  Eh  Pr°8ram  is 
expected  that  we  *7  Slde  °f  tha  edge  an/^^ 
the  best  candidate  slde  of  the  obj^ct^  ^ 


STUW  °F  SHADWS  OF  BUXLDXNG& 


iication  of  features  that  can  b 

shadow  of  the  h  ffj°8nltlon  of  build, US6d  f°r  Veri- 
Parking  [ot  *  b Riding.  There  Tidings  ls  the 

they  may  have^h^  rec°8nlzed  as  a  bui^  Wbere  a 

that  «  is  possibieT  3126  °r  shape  Itln?*  slnce 
verification.  T0  st,  a  USe  the  shaded  for  f  6? 

a  stripperafr0r 

llne  segment.  ^  llne  seg®ent  and  IVangL^f^i 

“  “£.ar — «.h 

each  iin®  J°  aVera8e  gray  levels  38  two  sides. 

82  a re ^associated *  ,7  ^rajf^^^ted  with 
tively.  Clated  with  angles  0  and  2*T  *  81  and 

D  respec- 


39 


Scatter  plots  of  2]  anri  .  , 
are  shown  in  Figure  i68f  ®  f®2  "ith  Aspect  to  0 
meats  and  in  Figure  17  £  u  °f  the  line  aeg- 
noise  cleaning.  Figure  17  1106  segmer‘ts  after 

levels,  the  population  of  nnt°”S  fhat  at  dark  gray 
greater  than  the  popuJaM  e  f°r  0>18O°  ^ 

This  shows  that  in  certain  °«  P°intS  f°r  0<18O°- 
1  Ine.  segments ,  there  exist  rt^ 

To  study  this 

Pick  the  darkest  P%  of  th  quan“tatlvely ,  let  us 
4  ot  the  Population.  Let 


and 


(M+r)°f  d3rk  P°intS  in  the  interval 


THE  RECOGNITION  PROCESS 

After  application  of  m„ 
np  to  now,  we  have  groups  of  pr°8raras  described 
(also  the  angles  between  thL  T^16  Segme"ts 
Parallel  segments.  Usinerh!  P3lrs  of  anti- 
buildings,  we  want  to !  update  ?  r°ads  and 

rlJ-c”  ~’2S2£VL  ITS- 

We  begin  by  dividnp 

Pairs  into  the  following8  £.£2"  C°mP3“ble 

A  \  « 


A)  Closed  groups 
B>  Semiclosed  groups 

Other  lines  and  groups 
In  what  follows  each 

will  be  explained  in  more  dftail.3b°Ve  Categories 
A)  Closed  groups 

By  a  closed  crnim  T,„ 

‘he  end  segment  labels  are  the"  th3t  the  Start  a"d 
shows  an  example  of  this  tvne  f3™6’  Figure  20 

figure  A,B,C. . . .  are  the  label  i8roup’  In  this 
group.  abels  in  a  compatible 

is  a  good  candidate  Cf or  being1^  °f  Cl°Sed  grouP 
of  a  building.  To  check  whefh  ?roup  °f  edges 
is  a  house,  we  test  for  solidn^  t!'iS  closed  group 
aides,  and  also  check that each  fi  ^  the  °bject 
group  is  antiparallel  to  a  line  Jine  segment  in  the 

a  line  in  the  group.  To 


heck  solidness  we  use  rho 
used  in  finding  the  antW  ??*, operator  that  was 
also  guarantees  the  simile  Uel  P3irs>  Thls  test 
the  object.  similarity  of  gray  level  inside 

The  above  check  ran  aicc 

Cases  (b)  and  (c)  in  Figure  pr^Th3116  betWeen  the 
group  with  the  above  comHii  '  ThuS  3  cl°sed 
«  kou,.  with  good  confident.  p“"hbe  C°"Sld*"d 
*k*d~  «n  be  used  to,  ,ur^„ 

D  \  <-»  .  . 


B)  Semiclosed  groups 


=  (e+F^e+^TT)311^  P°lntS  *  the  interval 
for  0  m  0,20,40, ,.34o  -  . 

function  of  0  for  different^  S’101"8  °f  nl/n2  as  a 
line  segments  after  and  hef  3lues  of  p  and  for  the 
shown  in  Figure  18  and  Fl  f  **  n°ise  cleanlng  are 
These  figures  show  that  th^"  !'  resPeotively. 

The  peak  is  greater  for  the  s  18  3  PEak  around  130°. 
removal.  As  expected^  P  isT^  3fter  noise 
becomes  smaller.  This  effect 8  *ncreased  the  peak 
several  other  scenes  and  of  L  tested  on 
tained.  d  similar  results  were  ob- 


a  gap  less  than  the  longest  line  d  38  3  8r°UP  Wlth 
snds  of  compatible  pairs  in  the  C°nneCtlng  the 
oemonstrates  an  example  of  this  fr°UP'  Figure  21 
case  of  a  closed  group  if  £  ,  P6'  As  in  the 

valid,  then  the  group  is  a..he.  t®llow«g  tests  are 
good  confidence.  '  accePted  as  a  house  with 

I)  Solidness 

iEnCLUgnrouphOUld  be  ar|tiparallel  to  a  line 

closed  Sroupsrare1us1idrhetretl,°Th£  V°r  Checking 
used  for  further  verification  h  Shad°W  can  be 

C) 


Other  lines  and  groups 


Here  again  the  model  of  rho  j 
a  house  or  road  will  be  used  for  COnstltuting 

the  remaining  lines  or  groups  «  8"ltlon  of 

features  are  the  and^A  P  The  imP°rtant 
Pairs  and  information  on  anti^D  Cha  compatible 
Figure  22  shows  examnles  a"tl-p3rallel  Pairs, 
occur  in  the  scene.  }n  thisP£i8lble  «aSeS  that  maF 
200  •  Special  care  should  L  T'  ' 9mln  ls  «°uni 
the  anti-parallel  pairs  or  com  ►  f?/"  Cases  where 
available,  due  to  cutoff  at  ^lble  palrs  are  not 

The  implementation  of  this  part  ^in  ^  fram6- 

F  rc  is  in  progress. 

EXAMPLES 

It  order  to  ^ 

tized  images  of  a  suburban  area^r,3  ^  °f  dlg1' 
were  used.  The  straiehr  ed  3  ln  Occoquan,  VA 
extracted  using  the  iterativ^  £  tbe  images  were 

°f  16]  •  The  resulting  lLeS  aned  rement  techn^ue 

scale  picture  were  used  as  in  e  orlgl"al  gray 
Figure  23  shows  one  of  the  in^e  5°  Che  Pr°gram. 

24  shows  the  set  of  lines  1?PUt  images  and  Figure 
data  of  Figure  24  (the  1  a BXtract^  from  it.  The 

with  the  gray  scali  input  Dic°tnt  C°0rdl"a‘->  along 

another  program  for  calcula^on^f3^  tbe  lnput  Co 
levels  on  both  sides  of  the  line  f  the  average  gray 
width  of  the  strip  is  ^  s^nts.  The 

value  was  selected  based  jC°  be  four  P°ints.  This 

resolution  of  the  pSe  AftWled8?  3b°Ut  the 
the  averages  on  both  sides  the*^  calculatl°n  of 
merit  is  calculated  accord^  Tt0r  °f  fi8ures 
typical  gray  levels  of  r °rdlng  t0  Section  2.  The 
t^en  to  be  equll  in  thL  ^  buildiags  are 
the  probability  of  "other'^ia* ’i,  ^  his;:ogram  of 
prom  this  histogram  it  is  clear  tZt'we'c^  25’ 


40 


completely  differentiate  between  two  classes  of  ob¬ 
ject  boundaries,  namely  objects  and  noise.  Figure 
26  shows  the  line  segments  whose  probabilities  of 
being  a  piece  of  road  or  building  are  not  equal  to 
zero.  This  figure  shows  quite  an  improvement  in 
rejecting  the  noise  edges.  Figure  27  shows  the 
results  of  finding  compatible  segments,  and  Figure 
28  shows  the  results  of  finding  anti-parallel  seg¬ 
ments:  the  midpoints  of  the  anti-parallel  pairs 
are  connected  together.  Figure  29  shows  the  good- 
confidence  houses.  Figures  30  to  35  and  36  to  41 
show  the  results  of  the  program  on  two  other 
scenes.  So  far,  only  the  recognition  of  houses 
with  good  confidence  has  been  implemented.  Work  is 
in  progress  on  other  cases. 


REFERENCES 


1.  A.  Rosenf eld  and  A.  C.  Kak,  Digital  Picture 
Processing,  Academic  Press,  1979. 

2.  W.  K.  Pratt,  Digital  Image  Processing,  John 
Wiley  &  Sons,  1979. 


3. 


4. 


5. 


6. 


7. 


L.  S.  Davis,  A  Survey  of  Edge  Detection  Tech¬ 
niques,  Computer  Graphics  and  Image  Processing 
Vol.  4,  No.  3,  Sept.  1975. 


R.  Nevatia  and  K.  R.  Babu ,  Linear  Feature 
Extraction,  USCIPI  Report  840,  1978. 


R  Bajcsy  and  M.  Tavakoli,  Computer  Recogni¬ 
tion  of  Roads  from  Satellite  Pictures,  IEEE 
iransactions  on  Systems,  Man,  and  Cybernetics, 
Vol.  SMC-6,  No.  9,  Sept.  1976 


S.  Peleg  and  A.  Rosenfeld,  Straight  edge  en¬ 
hancement  and  mapping,  Computer  Science  TR-694 
University  of  Maryland,  College  Park,  MD, 

Sept.  1978. 


R.  Brooks,  Global  directed  edge  linking  and 
ribbon  finding,  Proc.  DARPA  Tmage  Understand¬ 
ing  Workshop,  Menlo  Park,  CA,  April  1979,  72- 


8.  Ann  Scher,  Michael  Shneier,  and  Azriel 

Rosenfeld,  A  method  for  finding  pairs  of  anti- 
parallel  straight  lines,  Computer  Science 
TR-845,  University  of  Maryland,  College  Park 
MD,  Dec.  1979. 


^  of  Segment  A 
cases  case  side 

1  a  1 

2  a  1 

3  a  1 

4  a  1 

5  a  2 

6  a  2 

7  a  2 

8  a  2 

9  b  1 

10  b  1 

11  b  1 

12  b  1 

13  b  2 

14  b  2 

15  b  2 

16  b  2 


Segment  B 


case 

side 

Similar 

to : 

a 

1 

— 

a 

2 

— 

b 

1 

— 

b 

2 

— 

a 

1 

m 

a 

2 

if  1 

b 

1  #  4 

rotated 

180' 

b 

2  #3 

rotated 

180* 

C. 

1 

if  3 

a 

2 

m 

b 

1 

if  1 

b 

2  *2 

rotated 

90° 

a 

1 

//  4 

a 

2 

if  8 

b 

1 

if  12 

b 

2 

if  6 

Table  1.  Different  cases  of  pairs  of  line  segments 
according  to  the  conventions  of  Figure  3. 


Case 

1 


(a) 


Co) 

(c) 


(d) 


(e) 


Geometrical 

conditions 

Similarity  condi 
tions  for  the  line 

a2bl 

a2^1  <  a  b  /m 

If 

a2^i  <  d 

a2bl  <  aibi/m 

or 

xpbi  <d 

no 

check 

V9. 

or 

Xpa2  <  d 

\  ‘  XP  ‘  \ 

no 

check 

x  <  xh 
2  .  1 
y  *  y. 
a2  °1 


check  side  2 


check  side  1 


check  side  1 


check  side  2 


Table  2.  Geometrical  and  Similarity  conditions 
for  case  1. 


— 


41 


Case 

2 


(a) 


Geometrical 
condit ions 

albl  <  a2b2^m 
albl  <  alb2^m 
albl  <  a2bi/m 

'a-19. 


Similarity  conditions 
for  line  a^ 

If  a^bi  <  d  no  check 


Xb  <  Xp  *  xb  no  check 

2  k  □1 

Xpal<  Xpa2/nl’  Xpbl<xpb2/t” 


(b) 

ya2  <  y0 

If  ^ a  t  ^TJ  and  X  <  ( 
a  d  p 

%  <XYV  \ 

no  check 

else  check  side  1 

(c) 

ya2  <  y0 

If  3  ^  0  and  x  b  < 

a  d  pi 

'>xu  >  y,  &  y 
ai  bi  bi  ai 

no  check 

else  check  side  2 

(d) 

ya2  <  y0 

xa  *  xh  ,yk  >  y 

al  bl  bl  al 

check  side  1 

(e) 

ya2  >  y0 

If  9  ^  0  and  x  b  < 

a  b  pi 

x  >  x,  ,y  >  v 

no  check 

ai  bi  ai  bi 

else  check  side  2 

yQ  >  yn 

(f) 

a2  J0 

I£  9A  *  eB  and  ' 

x  <  x,  ,y  s.  v 

a,  b1  >ya ,  yb 

no  check 

1  1  1  1 

else  check  side  1 

(g) 

ya2  >  y0 

xa  s  xh  >y*  >  yK 

al  bl  al  bl 

check  side  2 

Geometrical 

conditions 

Similarity  conditions 
for  line  a^ 

Case  a^  <  a^/m 

If  axb2  <  d 

end  a]b2<a2b2/m 

1  alb2  <  albi 

no  check 

X  <  X  s.  X 

a2  p  ai 

Xp3l<  xpa2/m 

<3/  zpb2  <  xpbi/m 

no  check 

If  0  +  90° 

D 

x,  <  x  4  x, 

1  P  b2 

else  y,  <  y  <  v 

b2  p  \ 

xa  ^  X, 

(b)  al  %2 

ya  <  yh 
al  b2 

check  side  1 

X  <  xu 

(c)  al  b2 

ya  51  yh 

al  b2 

check  side  1 

X  <  x, 

(d)  al  b2 

If  Xpal<  d  no  check 

ya  >  yh 

31  b2 

else  check  side  2 

X  >  X, 

al  b2 

If  Xpb2<  d  no  check 

ya  *  yh 

al  b2 

else  check  side  2 

Table  4.  Geometrical  and 
for  case  3:  end 

similarity  conditions 

1  of  line  A. 

Table  3.  Geometrical  and  similarity  conditions 
for  Case  2. 


Geometrical 

conditions 

Case 

3 

end 

a2bl  <  aib2/m 

a2bi  <  a2b2 

2 

a2bl  <  aib]/m 

Xpa2<  xpal/m 

(a) 

xpbl<  xpb2/m 

X  ^  X  <  X 
a2  p  al 

If  0  f  90° 

D 

X,  *  X  <  x^ 

\  p  b2 
else  y  <  y  s, 
b2  p 

Similarity  conditions 
for  line  a^b^ 

If  a2bl  <  d 
no  check 


no  check 


(b) 

y,  Sy 
bl  a2 

If  Xpa2<  d  no  check 

X  >  X 

a2 

else  check  side  1 

(c) 

y .  <  y 

bl  a2 

X  s  X. 

a2  °l 

check  side  1 

(d) 

ya  S  yh 
a2  bl 

If  Xph-^<  3  no  check 

x  <  X 
a2  °1 

else  check  side  2 

(e) 

ya  S  yh 
a2  bl 

If  x  a  <  d  no  check 

P  l 

X  >  X, 

a2  bl 

else  check  side  2 

Table  5.  Geometrical  and  similarity  conditions 
for  case  3:  end  2  of  line  A. 


Geometrical 

Similarity  conditions 

Geometrical 

Similarity  conditions 

conditions 

for  line  a^b^ 

conditions 

for  line  a^b 

Case 

aibl  <  a2b2^m 

If  a^b^  <  d  no  check 

Case 

A 

a 2° 2  <  a^b^/m 

If  a.b,,  <  d  no  check 

L  L 

4 

end 

a  b.  <  a^b. /m 
11  x  1 

4 

end 

a2b2  **  a2bl^m 

1 

albl  <  alb2/ra 

2 

a2b2  <  alb2/m 

x  a.<  x  a  /m 

P  1  P  2 

Xpa2<  xpal/m 

xb<  x  br,/m 

pi  p  2 

x  bn<  x  b,  /m 

p  2  pi 

(a) 

X  <  X  2*  X 

a2  P  al 

no  check 

(a) 

x  a  ^  x  <  x 
p 2  p  a: 

no  check 

if  e0  t  90° 

D 

if  eD  t  9n° 

D 

X,  *  X  <-  X, 

bx  P  b2 

X  <  X  X, 

bl  P  b2 

else  y<  y  <y. 

b2  P  b1 

else  y,  y  <  y, 

bi  p  bi 

X  2“  X 

check  side  2 

y  >  x, 

a2  a  °2 

If  x  a  <  d  no  check 

(bO 

al  bl 

(b) 

P  2 

\  ’  \ 

S  ‘  \ 

else  check  side  1 

(c) 

X  <  X, 

*1  bl 

\ 1  \ 

check  side  1 

(c) 

X  ^  X, 

a2  b2 

a2  b2 

X  ^  X, 

a2  \ 

S  <  \ 

If  x  a  <  d  no  check 

P  2 

else  cherk  side  1 

If  x  b~<  d  no  check 

(d) 

X  >  X, 

■l  bl 

y.,  >  \ 

If  x  b,  <d  no  check 

P  1 

else  check  side  2 

(d) 

P  2 

else  check  side  2 

If  x  b  <  d  no  check 

Xa2  <  \ 
ya2  <  yb. 

(e) 

\  ‘  \ 

If  x  a  <  d  no  check 

P  1 

(e) 

P  2 

else  check  side  2 

Xa,  "  Xb 

else  check  side  1 

i  i 

Table 

7.  Geometrical  and 

similarity  conditions 

Table 

6.  Geometrical 

and 

similarity  conditions 

for  Case  4:  end 

2  of  line  A. 

for  Case  4: 

end 

1  of  line  A. 

Figure  1.  A  line  segment  and  strips  along  each  side  of  it 


# 


Figure  11.  Configurations  in  Case  3 


Configurations  in  Case  4:  end  1  of  line  A 


51 


25 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


* 


* 


* 


** 

** 


* 


* 


*** 


180 - 360 

a)  10% 


7  :  * 

6  ; 

5  i 

4  !  * 

3  !  *  * 

2  '  *  ** 

1  J  **  *** 

0  { 

0 - 180 - 360 

b)  30% 


3  i  * 

2  J  ***  * 

1  l  *****  ***** 

0  ! 

0 - 1  BO - 360 

c)  70% 


Figure  18. 


PKt  .i  n1/n2  as  a  function  of  0,  after  noise  cleaning. 


2  J  *** 

1  • ***********  **** 

0  ! 

0 - 160 - 360 

a)  10% 


1  '****************** 

0  s 

0 - 180 - 360 

b)  30% 


Figure  19.  Plot  of  n^/n^  as  a  function  of  before  noise  cleaning. 


Label  Label 
|~*A  E 

E  C 

C  F 

F  A 

_ t 


a)  A  group 


b)  A  house 


c)  An  intersection 


Figure  20.  Examples  of  closed  groups. 


52 


E 


the  gap 


a)  A  group  b)  A  group  c)  A  house  with 

missing  edge. 

Figure  21.  Examples  of  semiclosed  groups. 


a)  A  curved  road 
0,  +  0M  >  0  . 


ree  or  four  way  intersection 

+  0.  >  e  . 

I  min 


Figure  22.  Examples  of  possible  cases  of  roads  and  buildings. 


t&tma 


Figure  23.  A  suburban  scene. 


Figure  24.  Line  segments  fitted  to  the  edge 

components  of  the  scene  of  Figure  23. 


♦  * 

♦  * 

*  * 

*  * 

♦  * 

*  * 

*  * 

*  * 

*  * 

*  * 

*  * 

*  * 

►  * 

►  * 

*  *  * 

*  *  * 

>  *  *  * 

*  *  *  * 

'  *  *  *  * 

'  *  *  *  * 

*  *  *  * 
till 
»  **  01  O  <1- 


j 

“'■ONOod'O 


Figure  25.  Histogram  of  the  probability 
of  "other". 


wm 


Figure  34.  Antiparallel  lines. 


Figure  36.  Another  suburban  scene. 


F.gure  37.  Line  segments  fitted  to  the  edge 
components . 


Figure  39.  Compatible  lines. 


Figure  41.  Houses  with  good  confidence. 


I  4 


■ 


ATMOSPHERIC  MODELLING  FOR  THE 
GENERATION  of  ai  bedo  images 

Robert  \\  Sjoberg 
llcrtiiold  K,  P.  Horn 

MIT  Artificial  Intelligence  Laboratory 
545  Technology  Square 
Cambridge,  MA  02139 


\bstract 

Accurate  classification  of  terrain  is  important  in  the  management 
of  natural  and  other  resources.  Images  obtained  from  satellite  and 
other  high-alutudc  observations  are  useful  in  this  task,  but  only  as  they 
provide  reflectance  information  about  the  surface.  In  areas  of  rugged 
topography,  where  much  of  the  world  s  resources  are  found  the  deter¬ 
mination  of  surface  reflectance  is  hampered  by  interference  from  the 
atmosphere  and  by  the  presence  of  cast  shadows.  The  sky  serves  as 
a  distributed  light  source  which  differentially  illuminates  the  surface 
depending  on  local  surface  orientation.  Path  radiance,  which  includes 
Reflect, onof  sun, ,gh,  off  the  atmosphere  directly  to  an  observation 
p  onn.  a  ds  a  significant  noise"  component  to  the  measured  scene 
radiance  I  he  research  reported  herein  demonstrates  that  although  un¬ 
derstanding  the  effects  of  the  atmosphere  is  a  complex  task,  the  adop- 
tmn  of  even  simple  models  can  provide  substantial  improvement  over 
results  obtained  with  no  model  In  particular,  i,  is  shown  how  the  abun- 
d  nee  of  cast  shadows  in  mountainous  regions  aids  in  the  determina- 

presemed  Wa'  Sky  lllu,nlnation  models  are  also 

L  Introduction 

ham  mll]UT  faC‘0rS  Wh'Ch  imCraC‘ in  any  SilUa,'0,' are  Wo0* 

die  geometrical  arrangement  of  the  objects  and  the  viewer 

•  the  spatial  (and  spectral)  distribution  of  incident  illumination 
the  photometry  of  the  surface  (its  reflectance  properties) 

•  the  topography  (shape)  of  the  surface  viewed 

en  T  1110  H8ht  thCOry'  kn°win«  of  ^se  factors  often 

nablcs  us  to  obtain  the  fourth  from  a  given  image.  Work  on  extracting 

’!■!'  w””dh"  1S7‘*  w»“- 
rasIM  i.  chliiSr  M  lmi  h“  bOT 

One  useful  variation  is  the  determination  of  surface  reflectance  at 

scpsing'immed"  ,7*  ^  "*  SUrfaCC  shaPc-  Applications  to  remote 
ensmg  .mmcdiately  come  to  mind  'I he  essential  task  facing  an  image 

interpreter  or  an  automated  classification  system  in  large-scale  terrain 

Z7  ‘  'T°  d5,“"d  ,,d“n"  n»dp  b,  S 

altitude  aircraft  or  spacecraft,  and  produce  a  description  of  the  surface 
Much  of  the  earlier  interpretation  of  conventional  satellite  imagery  used 
radiance  values  directly  as  input  to  a  classifier.  Some  a, temple 
era  c  >  successful  over  large  agricultural  areas  in  the  midwestem 


United  States  [Henderson  1975).  Most  suffered  from  large  variations 
I  fRocho^978L  'd'UCS  occurrc<i  *n  areas  of  even  moderate  relief 

If  one  -ould  correctly  account  for  all  effects  in  the  rmage  other 
than  surface  reflectance,  tire  i  it  should  be  possible  to  create  an  albedo 
"nage  as  input  to  an  automated  classifier  The  albedo  image  charac- 
“  intrinsic  properties  of  the  surface,  independent  of  local  topog- 
phy  illumination,  or  satellite  position,  and  should  lead  to  more  ac¬ 
curate  results.  Related  work  by  Horn  [Horn  1978]  in  hillshading  lus 
ernonsirated  the  feasibility  of  creating  synthetic  images  using  models 
of  source  illumination  and  surface  topography. 

In  order  to  correctly  model  the  satellite  remote  sensing  situation 
on  must  consider  a  fifth  factor,  one  that  is  no,  generally  significant’ 
m  terres. rial  imaging:  the  effect  of  the  intervening  atmosphere.  Its 
qualitative  eflects  have  been  known  for  a  long  time  and  are  well- 
documented  [Minnacr,  1954],  The  atmosphere  attenuates  radiation  due 
o  scattering  and  absotption.  To  an  observer  on  the  ground,  the  sev¬ 
ered  radiation  gives  rise  to  the  familiar  blue  sky.  To  an  obsew  tn 
space,  the  scattered  component  contributes  "noise"  in  the  form  of  path 
rad'ancc  °r  a,r  glow.  17, is  scattering  also  serves  as  a  blurring  agent. 
Which  redirects  reflected  ground  radiation  from  outside  the  imaged  tar¬ 
get  into  the  observation  pad,  [Ottoman  and  Fraser  1979).  The  magnit- 
dude  of  these  effects  depend  on  wavelength.  Absorption  is  relauv'ly  U 
nunor  m  the  near-mfrared  and  visible  wavclcnghs  to  which  satellites  * 
the  popular  Landast  series  are  sensitive  but  scattering  increases  at 
the  shorter  wavelengths.  All  four  cflccts-attcnuation.  sky  radiance 
path  radiance,  background  radiance-should  be  represented  in  an  at- 
mosphenc  model  in  order  to  construct  accurate  albedo  maps. 

Our  research  concentrates  on  areas  of  rugged  topography.  This 
focus  stems  naturally  from  the  earlier  shape  from  shading  and  hill  shad- 

™lgt  d“  '“J  *»«*.  map,  ,h,,,.d  provide  Ac 

most  benefit.  Mountainous  regions  are  important,  since  a  major  use 

of  Landsat  imagery  is  in  mapping  natural  resources,  particularly  forests 
and  watersheds,  which  usually  occur  in  areas  of  significant  relief. 

2.  Development  of  an  Engineering  Model 

2.1  Role  of  Radiative  Transfer 

Any  investigation  into  understanding  the  nature  of  atmospheric 
effects  must  reckon  with  the  field  of  radiative  transfer  through  dis¬ 
tributed  media.  Hie  study  of  radiative  transfer  is  an  enormous  and 


- 


m 


^ - 


complex  dom.iin.  Out  td  curly  theoietiCSl  work  on  dour  and  ha/y  at¬ 
mospheres  [Rayleigh  18*1;  Schuster  1905;  Sdiw.il/5child  1906]  grew 
scattering  theory  Astrophysicists  suid>  me  stellar  and  planetary  atmos¬ 
pheres  worked  out  the  first  mathematical  formulations  of  the  problem 
a«  a  set  of  noii-tincar  mtegrodiflereniinl  equations  in  three  dimensions 
[Chandrasekhar  I960  Sc k era  I96S|.  \n  analytic  solution  has  been 
found  only  for  die  radicr  restnetea  ease  of  isotropic  or  molecular  scat¬ 
tering  in  a  semi-infinite  or  finite,  horizontally  homogenous  p  anar  at- 
n  ospherc  [Cliandrasekhar  I960].  Numerical  methods  provide  inroads 
to  a  solution  as  well  as  somewhat  clearer  insight  into  the  physical  prin¬ 
ciples  involved.  Research  in  this  area,  especially  applied  to  satellite 
remote  sensing  and  communications,  has  generated  a  van  literature 
[Howard.  Garmg  1971 :  LuKocra  Turner  197V.  Ro/enberg  I960),  several 
annual  symposia,  and  a  host  of  multidisciplinary  journals. 

2.2  Simple  Models  Permit  Engineering  Vpproaeh 

One  goal  of  tins  research  is  to  construct  an  engineering  model  of 
die  satellite  imaging  process.  It  should  be  computationally  simple,  rely 
as  much  as  possible  on  die  image  data  itself,  very  little  on  a  priori 
or  a  posteriori  information  about  paruculai  scenes,  and  provide  ade¬ 
quate  accuracy  for  the  application.  'Hie  full  sc!  of  radiative  transfer 
equations  is  far  too  complicated  for  our  purposes,  and  unnecessarily 
cumbersome.  On  the  other  hand,  numerical  methods  such  as  doubling 
optically  dim  layers  [Hansen  1969]  or  Monte  Carlo  techniques  [Plass, 
Kudawnr  1968)  require  knowledge  of  boundary  conditions  (including 
the  surface  reflectance  we  a.c  trying  to  find!),  complete  specification  of 
the  atmospheric  state  (generally  unknown  or  ill-determined),  and  take 
considerable  amounts  of  computation  time  to  vicld  a  result  for  a  single 
set  of  conditions.  A  more  productive  approach  is  the  use  of  simple  ap¬ 
proximate  models  which  are  numipulablc  and  have  a  small  number  of 
parameters  to  he  determined  by  the  data. 

2.3  Peculiarities  of  Rugged  Terrain 

There  me  certain  atmospheric  effects  in  areas  of  mountainous  ter¬ 
rain  due  to  elevation  differences  and  the  consequent  variation  in  the 
amount  of  air  between  ground  and  observation  platform.  With  increas¬ 
ing  elevation,  one  expects 

•  increasing  incident  solar  irradiance 

•  increasing  reflected  (target)  radiance 

•  decreasing  path  radiance 

•  decreasing  sky  irradiance 

lltc  first  two  result  because  there  is  less  attenuating  material  at  the 
higher  elevations.  'Hie  last  two  result  because  there  is  less  scattering 
material  at  higher  elevations. 

Hill  slopes  also  cause  the  target  irradiance  to  vary  across  the 
image. 

•  surfaces  tilted  toward  the  sun  receive  more  energy  than  those 
tilted  away 

•  surfaces  with  large  slope  see  less  of  the  sky  than  flat  surfaces  and 
arc  irradiated  correspondingly  less. 

Finally,  at  the  relatively  low  sun  angles  typical  of  Landsat  images, 
there  should  be  significant  areas  in  shadows  cast  by  neighboring  hills. 
Since  the  sky  is  the  only  source  of  illumination  in  shadows  (both  cast 
shadows  and  self-shadows),  it  is  important  to  understand  its  contribu¬ 
tion. 


3.  Satellite  Imaging  Fquation 

3  1  Relation  of  Reflectance  to  Satellite  Radiance 

Flic  image-forming  prove* ..  considered  here  is  shown  in  Figure  1. 
lhc  satellite  plaifonn  is  assumed  to  be  at  mfiniu  and  views  the  surface 
normally.  Its  instantaneous  field  of  view  (!FO\  )  defines  a  surface  ele¬ 
ment  of  area  A  at  some  altitude  with  surface  normal  nuking  an  angle 
0n  with  respect  to  the  vertical  fhis  element  is  illuminated  directly  by 
lhc  sun,  whose  extraterrestrial  volar  irradiance  i>  attenuated  by  a  fac¬ 
tor  Td.  Of  course,  there  is  no  direct  illumination  if  \  lies  in  shadow.  The 
element  is  also  irradiated  diffusely  h>  ihe  s*y  with  total  sky  irradiance 
£*.  Es  depends  on  die  surface  slope,  and  both  E,  and  Tti  depend  m 
general  on  altitude. 

Assume  that  the  surface  is  a  I  amberuan  reflector  with  albedo  p, 
0  <  p  <  1  {albedo  as  used  here  is  die  bihemisphencal  reflectance 
defined  in  (Nicodemus  ei  al  !9“,7|:  the  proper  bidirectional  reflectance 
distribution  function  for  a  1  umhertun  reflector  is  fr{0n  <pt,0rt  <pr)  = 
p/x).  Many  substances  arc  Approximately  !  ambeiii:ni.  and  die  assump¬ 
tion  makes  calculations  reasonable,  since  die  reflected  radiance  is  inde¬ 
pendent  of  emiiMncc  angle  and  is  proportional  to  the  total  irradiance. 

lhc  target  element  A  emus  a  radiance  Lt  m  die  /enitli  direction 
which  ts  attenuated  by  the  factor  Tu.  and  augmented  by  die  path 
radiance  Lp.  The  radiance  Lln  actually  measured  by  the  satellite  is 
therefore 

Lm  =  LtTu  +  Lv 

=  (S >TdS  +  E,)Tt.  +  Lp 

lhc  factor  S  accounts  for  A  being  in  shadow  and  for  die  reduced  solar 
irradiance  due  to  foreshortening. 

fO.  if  pixel  is  in  cast  shadow; 

5=  <cos0lOl  ifcos0,o>O 
VO.  otherwise. 

0,o  is  die  angle  hetween  die  surface  normal  in  polar  direction  (0m0n) 
and  die  sun’s  direction  ((\j,  <fo). 

cos  =  cos  0H  cos  0y  4-  sin0,,sin0ocos(<!>„  —  <ftj) 

(The  reference  line  for  azimuthal  angles  0  is  arbitrary,  but  lies  in  the 
plane  of  the  earth’s  surface.)  The  albedo  of  the  surface  at  die  imaged 
point  can  be  written 

^{J^tn  Lp) 

P  ~  {EoTdS  +  E,)Tu 

We  assume  that  Lm  (converted  from  its  digitized  value  to  tne  ap¬ 
propriate  radiance  in  mw  cm^sr""1).  £4  and  S  (computed  from  the 
shadow  and  surface  slope  models)  are  available.  The  remaining  four 
parameters-—?^,  Tu<  Lps  and  depend  on  the  actual  atmospheric 
model  adopted. 

3.2  General  Assumptions 

The  discussion  which  follows  assumes  geometrical  ray  optics,  a 
point  source,  distant  source  and  observer,  and  monochromatic  light 
or  the  equivalent  band-averaged  quantity.  It  ignores  polarization,  at¬ 
mospheric  absorption,  and  mutual  illumination  of  surface  elements  by 
neighboring  ones  as  in  valleys. 


1-' 1  1,1  "f  I'opograpliic  Models 

'lie  region  Chosen  for  the  study  ,s  a  j;  5  km  h.  ,,  -  . 

(H  southwestern  Suu/Cr|.inj  (lir  tth  ,  ..  ,  >210  km  pan 

m  \  i  ^  C  11  olcMiion  mode!  (1)1 

, ...  .  dS,lt  "™pcs  ,ud  bt>c"  ohumed.  lire  l)|,M  is  ;ln  arras  of  ]  75  by 

SndVk!T  '■'h‘Cshi'rUai'  ^'••'mr/ation  is  1(1  meters)  on  a  100  meter 
end.  I.kv.iiions  range  from  4J0  m  me  Rhone  River  sallcs  to  »m  l 
011  the  summit  *ifdc.s  Dmhlercis  Sitelfnp  i  f  c>  to  3.10m 

stvnc  UP  XDOSCs  ,  S‘Ucll,K  data  is  a  portion  of!  ;mds»t  I 

K  S  09555  squired  dh«Hit  9:5SGMrOuohcr9  ig-n 
JI  tunc  h,,d  .in  c!e\ ;n inn  on.t  2  dcpr  '  '  "  1  SUn 

dcarm  llhun  I97s  h  ,r  u  .  d^ts- ;,n^  »n  .i/muiih  of  154.8 
* |m,rn  <\  nnrn.WdoJh.im  M)70hl  u.j 
u pri  111  ,y  yo|  Kadiomctric  corrections 

I9-TR.il  I  he  'st  '»*>  TT  TCmaT  Stnpinp  cffccK  |,,<)rn*  Wnodham 

scene  w,,s  then  re^siered -o  ,h  hi  m 
sls'suibed  in  lllorn.  lU'hmm  l‘PSI  i  it.  i  "sing  the  method 
ner  hand  4  (MSS -4  0  5  Of.  P  ‘  "mlti'pcttral  scan- 

1  W'^  (h  °C/im.  green  ycllovs)  for  the  scene. 

A  digiul  slope  model  was  eonstnicted  front  the  HI  M  h„ 

. •— . . 

£■  sst . r-  ™e  l 

F  °  4.  O  mask  point  indicates  the  pixel  is  in  shades* 

S£==t 

since  their  irradiatte^w^S^ 

»=£«as^-« 


reader  ,  reierm  ?  ‘h°  lo'*er  1  «"  ’  km  -fair  Hie 

^-esenedthe^  ^ 

Pl-n..r.  h.3STn!fi3^d'V'  7  'm",,'phero  *hich  is 

nr  non  ahsorhing  sc  ntcrers  ,n  a"np,"cd  1,1  two  kinds 

^.od  me  ,Jo7'ZZZ  TTZ  ^  «*"«• 
the  Kasleiph component)  I  ich  f  ^  "  l,r  f"u,lcr  altitudes  (called 

-*  «■  K-  *..*771,;:';,';: K  77* 

"*«»  i*»  oawoMia ^  * 
%)  »hich  goes  ,he  fncuon  of  ,,  ,  CX‘nK,:on  coefficient 

* — 7  ^zr7rM  m* 

and  the  fraction  ..... ...  ,  '  “  ™'nE““M  f™»  an  incident  bean, 

ST  "• 


#?(*)  =  Pit(0)c~,/H» 


and  0A  (a)  =  ^(0)e-*/«4 


4.  Modeling  Path  Radiance 


41  Constant  IVh  Radiance  Model 

™  r  *  r  ™: r~ » » 

r  -  sriSi-jfi  - 

)’  d  removc  1110  hazy  appearance  of  the  image. 

4.2  EIe»  at  ion -Dependent  Path  Radiance 

siiisHs— 

considerably  depending  on  locale,  even  across  Zr^ZcZ 


ne  reference  altitude  is  arbitary,  but  we  chose  »  »  n  „  ,  . 

rCSPCC[iVC  “•« 

component  then  has  the  form  mCnl  at  3  Uludc  ^  for  cach 

Lp(z o)  =  ^£ohl  _e— ?r(2o)j 

where  Eo  is  die  extraterrestrial  solar  irradiancc  a  =  l  4.  s  , 
rfe)  is  the  apnea! depth  to  altitude*,  defined  by  +  ' 


W  aj 


too 

=  /  /?(«)& 

fX) 

=  ]  0(0  )e-*/”dz 

J*c 

=  0{O)He~‘/H  =  r(0)e-^w 

a,,b  in,  f^T 

of  tiic  optical  depth  as  well:  tC°  n  Crms 

ru(2)  =  e~r(*)  and  rd(«)  =  t-rW«c«b 
n,C  ,0taI  path  radiancc  is  *>">  or  the  two  component  radiances 


<■>  1 


L),(Ai!  =*  I-'/AAi )  4-  I**(zo) 

.l^'gi2  fM  f!ip(  ■  4-r ,(A,))] 

4.3  Finding  Path  Radiance  Parameters 

Kor  om  two-component  model,  there  .'ire  four  parameters  to  deter¬ 
mine  from  the  image  data.  One  method  i<  to  use  clear  water  lakes  as 
standard,  low  albedo  reflectors  [Ahern  et  al  \T1\.  If  dic>  were  ideal 
absorbers,  the  measured  radiance  would  indeed  be  all  path  reliance. 
I  nformnatch.  not  all  images  will  have  sufficiently  main  clear  water 
lakes,  nor  are  the  lakes  likely  tt>  be  distributed  convenient!}  through  the 
altitude  range. 

Areas  in  cast  shadows,  on  the  either  hand,  arc  common  in  moun¬ 
tainous  regions,  and  provide  an  alternative  wa>  to  determine  path 
radiance.  Shadowed  pixels  are  general!)  scattered  throughout  the  image 
at  all  but  die  highest  elevations.  They  are  illuminated  only  b>  the  sky 
but  if  tJicir  albedo  is  small,  one  may  take  the  recorded  satellite  data  as  a 
fair  value  of  the  path  radiance  for  die  corresponding  elevation. 

Figure  7  shows  a  scattcrgrani  of  the  radiance  values  for  all  shad¬ 
owed  pixels  in  die  MSS  4  image  The  straight  line  at  the  bottom  is 
from  the  constant  padi  radiance  model  mentioned  earlier  The  curve  is 
considered  a  good  fit  to  die  data  for  die  ,)  equation  given  above. 

1  he  parameter  values  arc  r/f(0)  «r  0.0981  r^(0)  =  0.1831  HR  = 
9500  m.  and  II \  =  1800  m.  The  curve  is  not  very  sensitive  lo  the 
precise  choice  of  parameters.  The  optical  depths  at  sea-level  w'ere  taken 
fr<  n  tabulated  values  for  moiccukn  and  aerosol  components  [Valley 
)%5|  and  the  scale  heights  adjusted  for  an  eyeball  fit  (a  slight  gap  w'as 
left  since  die  surface  docs  not  really  have  zero  albedo).  Hicrc  is  no 
point  to  modelling  die  upward  turn  of  the  minimum  radiances  at  higher 
elevations,  since  diese  pixels  are  mostly  covered  with  snow.  ITicir  high 
albedo  results  in  large  values  of  reflected  sky  radiation  even  diough  the 
sky  is  not  as  bright.  There  is  also  considerable  radiance  due  to  mutual 
illumination  and  multiple  scattering  from  die  surrounding  bright  back¬ 
ground. 

No  image  is  included  here  to  illustrate  this  improved  path  radiance 
model.  The  differences  between  it  and  die  constant  model  (Figure  6) 
arc  too  subtle  to  survive  the  printing  process.  The  major  difference  is 
die  appearance  of  slightly  darker  valleys.  This  is  to  be  expected,  since 
subtracting  a  single  minimum  value  from  every  point  in  the  image  fads 
to  compensate  adequately  for  the  larger  path  radiance  at  lower  altitudes. 

Hie  exponential  model  of  path  radiance  is  used  diroughout  the 
remainder  of  diis  paper,  although  it  is  unlikely  dial  large  discrepancies 
would  result  from  using  a  constant  value  instead. 

5.  Models  of  Sky  Radiance 


#> i .  On  ) 


0.  e>)  cos#,  sin  0  d0  do 


The  geometry  is  illustrated  n  Figure  8  Hie  double  integral  is  over 
die  entire  hemisphere  of  sky  to  which  the  surface  element  is  exposed.  It 
».s  uiiplu.it  in  the  equation  that  on  >  downwel  ng  radiance  from  the  sky 
is  included,  diat  is.  L,  for  $  >  »,  2  is  zero.  I  he  factor  cos#,  accounts 
for  the  foreshortening  of  die  surface  as  the  integration  direction  deviates 
from  (#„,  On)  and  is  given  by 


cos#,  —  cos 0,,  cos#  j  sin  #„  sin#  cos(0r,  o) 


The  upper  limn  on  polar  angle  0  is  a  non-inv  iul  function  of  die  azimuthal 
angle  o.  and  reflects  the  situation  illusiiuicd  in  Figure  9  where  part  of 
die  sky  behind  die  surface  element  is  cut  off. 


cot  0i{(<t>) 


=  f  ^2- 

(  tan#,, 


cos(0n  —  <?), 


if \<t>\  <  */2 
if  |0|  >  */2 


Except  for  relatively  simple  expression*  of  the  sky  irradiancc 
integral  is  exceedingly  difficult  to  solve  Fortunately,  the  assumption  of 
isotropic  scattering  implies  diat  all  sk.  radiance  functions  we  consider 
4  are  independent  of  azimuthal  angle  o  It  also  means  di.it  irradiancc 
from  the  sky  on  any  surface  element  is  independent  of  the  azimuthal 
angle  <£„. 

5.2  l  nifomi  Hemispherical  Sky 

If  the  sky  radiance  function  L„  is  a  conv.uu,  then  the  s*y  ir- 
radiancc  integral  is  easily  evaluated  [Horn  and  Sjoberg  1979;  brooks 
1978;  Moon  1942] 

/■:,(*>,  0„)  =  Jm  i  +««*„) 

lliis  function  was  used  to  generate  die  albedo  image  shown  in  Figure 
10.  The  brightest  areas  correspond  to  albedos  of  0.73  or  larger,  dark 
areas  to  0  0  The  constant  Ls  was  chosen  to  be  0.83  mw  cm  ”2sr"  *, 
a  value  calculated  so  diat  a  flat  surface  would  have  an  irradiancc  com¬ 
parable  to  measured  values  [Rogers  1973).  The  improvement  over  the 
naively  made  albedo  image  is  striking.  For  one  diing,  shadows  now  con¬ 
tain  useful  information  diat  was  not  appparem  in  die  original  Landsat 
data.  Snow'  fields  on  the  glacier  in  die  northeast  (upper  left)  comer 
which  arc  shadowed  arc  visible  as  high-albedo  regions.  Much  of  the 
terrain  now  appears  somewhat  flattened  because  die  shading  variations 
which  give  clues  to  shape  arc  diminished  or  eliminated.  Of  course, 
there  are  many  isolated  high-alu?do  points  which  result  from  noise  in 
the  data,  especially  on  the  ridge  lines,  where  slope  information  is  very 
coarse  or  outright  wrong. 


5.1  Effects  of  Sky  on  Scenes  with  Mountain  Slopes 

The  slopes  found  in  rugged  terrain  certainly  affect  image  forma¬ 
tion.  A  tilted  surface  is  exposed  differentially  to  the  sky,  and  receives 
more  or  less  radiation  as  the  magnitude  of  its  slope  changes.  Conside" 
an  element  of  surface  at  altitude  zq  with  surface  normal  pointing  in 
direction  (#nj  <pn).  Let  the  distribution  of  sky  radiance  for  an  observer  at 
altitude  zq  be  given  by  L8{zq,  0,  <f>).  Then  the  total  irradiance  due  to  sky 
illumination  is  [Horn  and  Sjoberg  1979] 


5.3  Flcvation-Dependcnt  Hemispherical  Sky 

One  improvement  to  the  uniform  hemispherical  sky  model  made 
at  only  a  slight  cost  in  computation  is  10  make  the  sky  radiance  L,  a 
function  of  elevation.  The  sky  is  brighter  at  lower  altitudes,  and  sky 
irradiancc  should  be  greater.  A  simple  model  for  elevation-dependent 
sky  radiance  is  La{zQ,d)  =  CLz(zo)t  where  L2[zq)  is  the  zenith  sky 
radiance  measured  by  an  observer  at  elevation  zo.  Except  at  extreme  sun 
angles,  the  zenith  sky  radiance  differs  from  the  path  radiance  by  only  a 
few  Dcrccnt.  C  is  a  constant  w'hich  may  be  varied  for  the  best  effect 


h . 


!■  igurc  11  is  an  albedo  map  for  C  =*  3  74.  a  \a!ue  selected  so  that 
die  total  irradiance  of  a  flat  element  at  an  average  altitude  would  match 
experimental  values.  Ilieie  are  no  readily  apparent  differences  between 
the  altered  model  and  the  constant  one.  Close  examination  reveals  that 
points  at  low  altitudes  have  slightly  lower  albedos  with  the  elevation- 
dependent  model,  due  to  the  increased  sky  irradiance.  Similarly,  points 
.it  high  altitudes  have  slightly  higher  albedos  because  of  die  reduced  sky 
irradiance. 


Last  shadows  can  he  used  to  find  parameters  for  path  radiance.  The 
padi  radi.iiKc  so  determined  may  he  used  m  the  expressions  for  sky 
irradiance.  I  ven  simple  models  of  path  and  sky  radiance  reveal  more 
dian  is  apparent  in  die  raw  satellite  image.  If  the  results  obtained  $u  far 
m  dais  research  are  a  good  indication,  the  important  conclusion  may  not 
be  so  much  in  die  choice  of  model,  hut  that  son ic  model  of  atmospheric 
e fleets  be  incorporated  into  the  interpretation  phase. 


5  4  Sky  Radiance  for  an  Infinite  Planar  Skv 

The  two  previous  sky  models  are  easy  to  compute,  but  do  not 
icflcct  a  very  realistic  atmosphere.  To  an  observer  on  die  ground, 
the  real  sky  becomes  considerably  blighter  toward  die  hori/on.  This 
suggests  one  regard  die  atmosphere  as  the  plane  parallel,  horizontally 
homogeneous  and  infinite  model  described  in  section  4,2  above.  Sup¬ 
pose  din  die  sky  radiance  in  direction  (0,0)  were  proportional  to  the 
number  of  scatterers  in  die  slam  padi.  Since  this  varies  as  the  secant 
of  the  zenith  angle.  Ls(z o,  0)  C.(3>)  see  0.  The  constant  of  propor¬ 

tionality  here  is  once  again  the  zenith  radiance.  1  or  a  flat  surface,  this 
form  of  L<  integrates  out  to  2n Tor  lilted  surfaces,  the  integral 
blows  up  as  0  approaches  t,/ 2.  It  is  better  to  adopt  a  bounded  sky 
radiance. 

L  ,  _  rC:(*!)sec0.  if 6<em 

1  \LmsLz{zo)  sec0,„,  iftf>0m 

Widi  a  suitable  approximation  in  the  integral,  the  sky  irradiance  may  be 
worked  out. 


£**(*>  A)  =  2Lc(at))^C|  cos0M  +  Ci  si  n  0n 

+  (  —  0m)  cos  0rl  -  ?  cos2  0„  si  n  0n^ 

where  the  constants C\  and  Oi  depend  on  0m: 


?r 

COS  I 

4 

j  -  K 

cos  0m 


Tm 

—  sin  0rn  —  In 


1  —  sin  0m 
1  +  sin  0m 


Figure  12  shows  die  ahovc  sky  irradiance  relative  to  die  zenith  radiance 
as  a  function  of  0n  for  0m  =  85  degrees.  For  a  given  0n,  die  variation 
with  altitude  is  identical  to  die  depen  lence  of  the  zenith  radiance  (that 
is  to  say  die  path  radiance,  which  is  almost  die  same).  An  alhedo  map 
constructed  using  diis  form  for  sky  irradiance  is  shown  in  Figure  13. 
Once  again,  dicrc  are  no  striking  differences  between  it  and  the  previous 
two  models.  There  seems  to  he  slightly  better  definition  of  some  of  the 
lower  areas,  and  more  snow  cover  is  picked  up  with  die  secant  model. 


5.5  More  Sophisticated  Models 

Hie  form  for  E developed  in  the  previous  section  indicates  how 
solving  die  sky  irradiance  equation  can  be  difficult  and  tedious.  Pro¬ 
gressively  more  sophisticated  models  prove  disproportionately  more 
awkward,  even  with  approximate  mcdiods.  We  arc  continuing  research 
into  more  realistic  yet  (we  hope)  computationally  tractable  formulations. 


6.  Evaluation 

It  is  clear  that  some  sort  of  atmospheric  correction  must  be  applied 
before  albedo  maps  can  be  reliably  fiencratcd  for  mountainous  terrain. 


7.  References 

h  J  \ hern.  1).  G.  Goodenough.  S.  C  Jam.  and  V  K.  Rao  (1977). 

I  sc  of  clear  lakes  as  standard  reflectors  for  atmospheric  measurements. 
f*roi.  Hih/nil.  Sytnp.  Remote  Sensing  of  the  I  hvinmnient.  \nn  Arbor. 

25  29  April,  pp  731-755. 

M.  J.  Brooks  (]9?S).  Investigating  the  effects  of  planar  light  sources. 
GSM  22.  Department  of  Computer  Science.  ILssex  University,  Golchcstct. 
I  ingland. 

S.  Chandrasekhar  (1960).  Radiative  Transfer.  New  Mirk.  Dover 
Publications. 

R.  S.  Iraser  ahu  R.  J  Curran  (1976).  F  Fee  is  of  die  atmosphere  on 
remote  sensing.  Chapter  2  of  Remote  Sensing  of  Fnvmmment.  J. 

1  ini/,  Jr  and  D.S,  Summon,  edv  Reading;  \ddison-Wcslcy. 

R.  G.  Henderson  (1975).  Signature  extension  using  ilic  MASC  algo* 
ritJini.  Tree.  Second  Synip.  on  Machine  Processing  of  Remotely 
Sensed  Data.  W.  I  afiyettc.  IN,  3-5  Ju»:e. 

M.  1).  Mailing  and  R.  M.  Holler  (1979)  Machine  prtx'cssing  of 
I  \N1)S.VI  MSS  data  and  DM  A  topographic  data  for  forest  cover 
type  mapping.  Proe.  f  ifth  Sytnp.  on  Machine  Processing  of  Remotely 
Sensed  Data,  West  Lafayette.  IN,  27-29  June  pp.  377-390. 

R.  M.  Hoffer  and  stafT(  1975).  An  Interdisciplinary  Analysis  of  Colorado 
Rocky  Mountain  Environments  Using  ADP  Techniques.  Alternate 
title:  Natural  Resource  Mapping  in  Mountainous  Terrain  by  Computer 
Analysis  of  f  RTS- 1  Satellite  Daia.  Laboratory  for  Applications  of 
Remote  Sensing  Information  Note  061575,  Purdue  University,  June. 

13.  K.  P.  Horn  (1975),  Determining  shape  from  shading.  Chapter  4 
of  The  Psychology  of  Computer  Vision,  P.  Winston,  cd.  New  York: 
McGraw-Hill. 

B.  K.  P  Horn  (1978),  t he  position  of  the  sun.  Working  Paper  162, 
Artificial  Intelligence  Laboratory,  M.I.T.,  March. 

B.  K.  P.  Horn  (1979),  Hill-shading  and  the  reflectance  map.  Proceed- 
ings:  Image  Understanding  Workshop.  Palo  Alto,  CA,  24-25  April 
pp.  79-120. 

B.  K.  P .  Horn  and  B.  L.  Bachman  (1978),  Using  syndictic  images 
to  register  real  images  with  surface  moaeK  CaCM ,  21:914-924,  No¬ 
vember 

B.  K.  P.  Horn  and  R.  W.  Sjoberg  (1979),  Calculating  the  reflectance 
map  Appl  Op .,  18:1770-1779, 1  June. 

B.  K.  P.  Horn  and  R.  J.  Woodham  (1979a),  Destriping  LANDSAT 
MSS  images  by  histogram  modification.  Computer  Graphics  and 
Image  Processing,  10:69  83. 

B.  K.  P.  Horn  and  R.  J.  Woodham  (1979b),  LANDSAT  MSS  coor¬ 
dinate  transformations.  Proe.  Fifth  Sytnp.  on  Machine  Processing  of 
Remotely  Sensed  Data,  W.  l^fayettc,  IN,  27-29  June,  pp.  59-68. 


6  \ 


H.  K.  P,  Horn.  K.  J.  Woodham,  and  W.  M.  Silvci  (1978).  I3ctcnnin- 
inp  shape  and  reflectance  using  multiple  images.  A.l.  Memo  490. 
Artificial  Intelligence  Laboratory.  August. 

J.  N.  1  loward  and  J.  S  Gaimg  ( 1971 ).  Atmospheric  optics  and  radia- 
u\c  transfer.  Trans.  Am.  Gcuphys.  Union.  52:371-189.  June. 

K.  IKcuchi  ( 19'79).  Numerical  shape  fiom  shading  and  occluding 
contours  in  a  single  view  A.l.  Memo  566.  \rtificial  Intelligence 
Laboratory.  M.l.L,  November. 

K.  Ikcuchi  and  B.  K.  P.  1  lorn  (19^9).  An  application  of  the  photometric 
stereo  method.  VI.  Memo  539.  Artificial  Intelligence  1  aboratory. 
M.I.T..  August. 

A.  J.  LaKocca  and  R.  1..  1  timer  (1975).  innosphMc  Transmit  tana 
ami  Radiance:  Methods  of  (  alculation.  IK  I A  Siaie-of-dic-Art  Report. 
Environmental  Research  Institute  of  Michigan.  report  no.  HR1M 
107600- 10-TJunc. 

F.  J.  McCartney  (1976).  Optics  of  the  Atmosphere.  New  York;  John 
Wiley  and  Sons. 

R.  A.  McClatchcy.  R  W.  Penn,  J.  I..  A.  Sclhy.  F.  K.  Volz,  J.  S. 
Caring  (1978).  optical  properties  of  the  atmosphere.  Section  14  of 
Handbook  of  Optics.  W.  G.  Driscoll,  ed.  New  York:  McGraw-Hill. 

G.  Mic  (1908).  A  contribution  to  die  optics  of  turbid  media,  espe¬ 
cially  colloidal  metallic  suspension.  Ann.  Rhvs..  25:377-445. 

M.  Minnaert  (1954).  The  Mature  of  Light  and  Color  in  the  Open  Air. 
New  York*  Dover. 

P.  Moon  and  I ).  h.  Spencer  (19-*2).  Illumination  from  a  non-uniform 
sky.  iliwninatrng  Engineering.  XXXVII: 707-726,  December. 

F.  IF  Nicodemus,  J.  C.  Richmond.  J,  S.  Ilsia,  1.  VV.  Ginsberg,  T, 
l-iinpcris  (1  977),  Geometrical  Consideratons  and  Nomenclature  for 
Reflectance.  National  Bureau  of  Standards,  NBS  Monograph  160, 
U.S.  Department  of  Commerce. 

G.  N  Plass  and  G.  W.  Kattawur  (1968),  Calculations  of  reflected  and 
transmitted  radiance  for  earth's  atmosphere.  Appl.  Op .,  7: 1 129-1135, 
June. 

J.  B.  Pollack.  D.  Colburn,  R.  Kahn.  J.  Hunter,  W.  Van  Camp,  C.  E. 

Cai  lston,  M.  R.  Wolf  (1977),  Properties  of  aerosols  in  the  Martian 
atmosphere  as  inferred  from  Viking  lander  imaging  data.  J.  Geophys. 
Rev.,  82:4479-4496,  September  30. 

J  B.  Pollack  (1967).  Rayleigh  scattering  in  an  optically  thin  atmos¬ 
phere  and  its  application  to  Martian  topography.  Icarus ,  7:42-46. 

Ford  Rayleigh  (1871),  On  die  scattering  of  light  by  small  particles. 
Phil  Mag. ,  41 .447-454. 

G.  Rochon.  II.  Audirac,  F.  J.  Ahern,  and  J.  Bcaubicn  (1978),  Analysis 
of  a  transformation  model  of  satellite  radiances  into  reflectances. 
i  roc.  12th  inti  Symp .  Remote  Sensing  of  the  Environment.  Manila, 
Philippines,  20-  26  April. 

R.  H.  Rogers  (1973),  investigation  of  Techniques  for  Correcting  CRTS 
Data  for  Solar  and  Atmospheric  Effects.  National  Aeronautics  and 
Space  Administration,  Report  NASA-CR- 132860,  August. 

R.  H.  Rogers,  K.  Peacock,  and  J.  J.  Shah  (1973),  A  technique  for 
correcting  ERTS  data  for  solar  and  atmospheric  effects  Third  Symp 
on  Significant  Results  Obtained  from  ERTS-f.  Washington,  DC,  10- 
14  December,  NASA  Report  SP  351,  pp.  1787-1804 


G.\  Ro/enberg  (I960).  1  ight  scattering  r  die  earths  atmosphere 
■Soviet  Thysu  \  l  speklo.  ^  »46~3'?1.  No\.-I9ec. 

V  SlIiiisici  (1905).  Radiation  through  a  foegv  atmosphere.  A. strophw 
7,2 1:1-22. 

K.  Schwar/schild  (1906).  Gottmger  \athnchten.  41. 

/.  Sekera  (1968).  Radiative  transfer  and  the  scattering  problem.  J. 
Quant.  Spec  (rose  Rad.  Trans..  8:17-24. 

ILF  1  timer  (19  4).  Radiative  Transfer  in  Rea'  if>nosphcrcs.  Fnvi- 
ronmenul  Research  Institute  of  Michigan  Report  FK1M  19100-24-T. 
July. 

S  L.  \  alley  (1965)  Handbook  of  Geophysics  and  Space  Environ¬ 
ments.  New  York:  McGraw-Hill. 

R.  J.  Woodham  (1976).  Iwo  simple  algorithms  lor  displacing  or- 
diographic  projections  of  surfaces.  Artificial  Intelligence  Laboratory 
Working  Paper  No.  126.  M  l.  L.  August  (unpublished). 

R.  J.  Woodham  (1978a).  Reflectance  map  techniques  for  anal>/ing 
surface  defects  in  metal  castings.  Artificial  Intelligence  Laboratory 
Technical  Report  1R-457  MIX,  June. 

R.  J  Woodham  (1978b).  I  ookinc  in  the  shadows.  Artificial  Intelli¬ 
gence  luiborator)  Working  Paper  No.  169,  M.l.L.  Ma>  (unpublished). 


Imaging  geometry  for  a 


al  scanner 
text  Sun 
m  North). 


normal  to  the  surface  elemrnr  mii  ^  lw,d,n*  in 

,!,umin“ mn  ,s  from  -» *  mi zir* 


from '  „  tlL  h  4  °f  thC  Lmdm  1  multfepcc 

elevation  is  34  JT  ^  SwiiZCrhnd  bribed  in  th 
elevation  is  34.2  degrees,  azimuth  J54.8  (counterclockwise  f 


DIGITIZED  radiance 


l-igurc  7,  Scattcrgram  of  radiance  values  from  (he  1  andsai  data  for  those 
fuels  determined  to  he  in  shadow  Vertical  aus  is  digitized  radiance  (a 
value  of  51 1  corresponds  to  2  iS  mw  cm  2sr  ').  The  horizontal  axis  is 
elevation,  in  10  m  intervals.  The  straight  line  represents  a  constant  path 
radiance  level,  and  the  curve  is  the  altitude-dependent  model  described 
in  the  text. 


Figure  9.  A  tilted  surface  is  exposed  only  to  part  of  the  hemisphere 
overhead. 


Km  tm 

* 


% 


***** 


.#■ 


* 


SATELUTE0 

path 

a«?mNctr 


,-£uftFAa 

► 


iftiH-CT 

lSAHAJ**! 


i  MC I  D^KrT 

SUhLlGh  L 


riRure  8.  Inc  geometry  of  sky  irradiancc.  The  sV\  in  direction  (8,4) 
contributes  a  portion  L»(zu,$,4>)du>  =  Lfa),  0,  $)  s\r\  0 dO  d$  through 
solid  angle  duuto  the  total  sk>  irradiancc  of  the  surface  element  at  eleva¬ 
tion  Z( j.  theta,  is  the  angle  between  the  surface  normal  and  the  sky 
direction 


IU-1/MrWAT  WWsjT/  - 


^'DIRECT 
ILLUMINE  T|Oft 


Figure  10.  Albedo  image  for  a  uniform  hemispherical  sky.  The  brightest 
pixels  correspond  to  albedos  of  0.73  or  larger. 


RAKUT-v  TAMPLF.  CWKSKNri’  ‘  -  A  f  A?ADIGM  l‘flR  v 

APPLIVATTOSS  T  IMAGE  ANAL'TIf  AVT  A ' T  m Y  A  7 FT-  ;Ari?: 

M.  A.  Fisehler  sr.d  C.  Qo  1 


v ;  w?rf 


Artificial  Intel  1  igence  Tenter 
TFT  International 
Venlo»  Park,  California  J4 


In  thli*  paper  we  introduce  a  new  paradigm. 
Random  Sample  Consensus  (RANSAC),  for  fitting-  a 
model  to  experimental  data.  RANT AC  is  capable  of 
interpreting/ smoothing  data  containing  a 
significant  percentage  of  gross  errors,  and  thus  is 
ideally  suited  for  applications  in  automated  image 
analysis  where  interpretation  is  based  on  the  data 
provided  by  error-pione  feature  detectors.  A  major 
portion  of  this  paper  describes  the  application  of 
RANSAC  to  the  Location  Determination  Problem  1 
(LDP):  given  an  image  depicting  a  set  of  landmarks 
with  known  locations,  determine  that  point  in  space 
from  which  the  image  was  obtained.  We  derive  new 
results  on  the  minimum  number  of  landmarks  needed 
to  obtain  a  solution,  and  present  algorithms  for 
computing  these  minimum-landmark  solutions  in 
closed  form.  These  results  provide  the  basis  for 
an  automatic  system  that  can  solve  the  LDP  under 
difficult  viewing  and  analysis  conditions. 
Implementation  details  and  computational  examples 
are  also  presented. 


INTRODUCTION 

In  this  paper  we  introduce  a  new  paradigm, 
Random  Sample  Consensus  (RANSAC),  for  fitting  a 
model  to  experimental  data:  and  we  illustrate  its 
use  in  scene  analysis  and  automated  cartography. 

The  application  discussed,  the  location 
determination  problem  (LDP),  is  treated  at  a  level 
beyond  that  of  a  mere  example  of  the  use  of  the 
RANSAC  paradigm;  we  present  new  basic  findings 
concerning  the  conditions  under  which  the  LDP  can 
be  solved  and  describe  a  comprehensive  approach  to 
the  solution  of  this  problem  that  we  anticipate 
will  have  near-term  practical  applications. 

To  a  large  extent,  scene  analysis  (and  in 
fact,  science  in  general)  is  concerned  with  the 
interpretation  of  sensed  data  in  terms  of  a  set  of 
predefined  models.  Conceptually,  interpretation 
involves  two  distinct  activities:  first,  there  is 
the  problem  of  finding  the  best  match  between  the 
data  and  one  of  the  available  models  (the 
classification  problem);  second,  there  is  the 
problem  of  computing  the  best  values  for  the  free 
parameters  of  the  selected  model  (the  parameter 
estimation  problem).  In  practice,  these  two 
problems  are  not  independent--a  solution  to  the 
parameter  estimation  problem  is  often  required  to 
solve  the  classification  problem. 


jlaesical  technique  fui  pammM^r  t i r* i  . 
such  as  "least  squares,”  optimize  {according  a 
specified  objective  function)  the  fit  of  a 
functional  inscription  modeP  to  All  of  tee 
presented  data.  These  techniques  have  no  int&rrwil 
mechanisms  for  detecting  and  injecting  gross 
errors.  They  are  averaging  techniques  that  rely  on 
the  assumption  (the  smoothing  assumption”)  that 
the  maximum  expected  deviation  of  any  datum  from 
the  assumed  model  is  a  direct  function  of  the  size 
of  the  data  set,  and  thus  regardless  of  the  size  of 
the  data  set,  there  will  always  be  enough  "good" 
values  to  "smooth  out”  any  gross  deviations. 

In  many  practical  parameter  estimation 
problems  the  smoothing  assumption  does  not  v'old: 
that  is,  the  data  contains  uncompensated  gross 
errors,  ^o  aeal  with  this  situation,  a  number  of 
heuiistics  have  been  proposed.  The  technique 
usually  employed  is  some  variation  of  the  idea  of 
first  using  all  the  data  to  derive  the  model 
parameters-  next,  locate  the  datum  that  is  farthest 
from  agreement  with  the  instantiated  model  assume 
that  it  is  a  gross  error,  delete  it,  and  iterate 
this  process  until  either  the  maximum  deviation  is 
less  then  some  preset  threshold,  or  until  there  is 
no  longer  sufficient  data  to  proceed. 

It  can  easily  be  shown  that  a  single  gross 
error  "poisoned  point"),  mixed  in  with  a  set  of 
good  data,  can  cause  the  above  heuristic  to  fail 
(for  example,  see  Figure  l).  It  is  our  contention 
that  averaging  if  not  an  appropriate  technique  to 
apply  to  an  "unverified"  data  set. 

In  the  following  section  we  introduce  the 
RANSAC  paradigm,  which  is  capable  of  smoothing  data 
that  contains  a  significant  percentage  of  gross 
errors.  This  paradigm  is  particularly  applicable 
to  scene  analysis  because  local  feature  detectors, 
which  often  make  mistakes,  are  the  source  of  the 
data  provided  to  the  interpretation  algorithms. 

Local  feature  detectors  make  two  types  of  errors-- 
classification  errors  and  measurement  errors. 
Classification  errors  occur  when  a  feature  detector 
incorrectly  identifies  a  portion  of  an  image  as  en 
occurrence  of  a  feature.  Measurement  errors  occur 
when  the  feature  detector  correctly  identifies  the 
feature,  but  slightly  miscalculates  one  of  its 
parameters  (e.g.,  its  image  location).  Measurement 
errors  generally  follow  a  normal  distribution,  and 
therefore  the  smoothing  assumption  applies  to  them, 
classification  errors,  however,  are  gross  errors 
because  they  have  a  signif icantly  larger  effect 
than  measurement  errors  and  they  do  not  average 
out . 


72 


Tn  t:  ■  final  sections  of  this  paper  we  discuss 
the  application  of  RAWS AC  to  the  location 
determination  piohlem: 

Given  a  set  of  '’landmarks”  (’’control  points"), 
whose  .ocations  aTe  known  in  some  coordinate 
frame,  determine  the  location  (relative  to 
the  coordinate  frame  of  the  landmarks)  of 
that  point  in  space  from  which  an  image  of 
the  landmarks  was  obtained. 

We  first  derive  some  new  results  on  the 
minimum  number  of  landmarks  needed  to  obtain  a 
solution,  and  then  present  algorithms  for  computing 
these  minimum-landmark  solutions  in  closed  form. 
(Conventional  techniques  are  iterative  and  require 
a  good  initial  guess  tc  assure  conveigence . )  These 
results  form  the  basis  for  an  automatic  system  that 
can  solve  the  LDP  under  severe  viewing  and  analysis 
conditions.  In  particular,  the  system  performs 
properly  even  if  a  significant  number  of  the 
landmarks  are  incorrectly  located  due  to  low 
visibility,  errain  changes,  01  image  analysis 
errors.  Implementation  details  and  experimental 
results  are  presented  to  complete  our  description 
of  the  LDP  application. 


RANDOM  SAMPLE  CONSENSUS 


The  philosophy  ol  RANSAC  is  opposite  to  that 
of  conventional  smoothing  techniques- -r ether  than 
using  as  much  of  the  data  as  possible  to  obtain  an 
initial  solution  and  then  attempting  to  eliminate 
the  invalid  data  points,  RANSAC  uses  as  small  an 
initial  data  set  as  feasible  and  enlarges  this  set 
with  consistent  data  when  possible.  For  example, 
given  the  task  of  fitting  an  arc  of  a  circle  to  a 
set  of  two-dimensiona1  points,  the  RANSAC  approach 
would  be  to  select  a  set  of  three  points  (since 
three  points  are  required  to  determine  a  circle), 
compute  the  center  and  radius  of  the  implied 
circle,  and  count  the  number  of  points  that  sre  • 
clos^  enough  to  that  circle  to  suggest  their 
compati  lity  with  it  (i.e.,  their  deviations  are 
small  enough  to  be  measurement  errors).  If  there 
are  enough  compatible  points,  RANSAC  would  employ  a 
smoothing  technique,  such  as  least  squares,  to 
compute  an  improved  estimate  for  the  parameters  of 
the  circle  now  that  a  set  of  mutually  consistent 
points  has  been  identified. 

The  RANSAC  paradigm  is  more  formally  stated  as 
follows  r 

Given  a  model  that  requires  a  minimum  of  n 
data  points  to  instantiate  its  free  para¬ 
meters,  and  a  set  of  data  points  P  such  that 
the  number  of  points  in  P  is  greater  than  n 
(^(P)  >  n)>  randomly  select  a  subset  SI  of  n 
data  points  from  P  and  instantiate  the  model. 
Use  the  instantiated  model  Ml  to  determine 
the  subset  SI*  of  points  in  P  that  are  with¬ 
in  some  error  tolerance  of  Ml.  The  set  SI* 
is  called  the  consensus  set  of  SI . 


If  4(Si*  is  greater  than  some  threshold  t, 
which  is  a  function  of  t ne  estimate  of  the 
number  of  gross  errors  l,:\  i  .  use  Si*  to 
compute  (possibly  using  Least  squares)  a  new 
mod  el  Ml*. 

If  (?1#>>  is  less  than  t,  randomly  select  a 
new  subset  ^2  and  repeat  the  above  process. 

If,  after  some  predetermined  number  of  trials, 
no  consensu:,  set  with  t  or  more  members  lias 
been  found,  either  solve  the  model  with  the 
largest  consensus  set  found,  or  terminate  in 
failure . 


There  are  two  obvious  improvements  to  the 
above  algorithm:  first,  ir  there  is  a  problem- 
related  rationale  for  selecting  points  to  form  the 
S’s,  use  a  deterministic  selection  process  instead 
of  the  random  one:  second,  once  a  suitable 
consensus  set  S*  has  been  found  and  a  model  M* 
instantiated,  add  any  new  points  from  P  that  are 
consistent  wxth  M*  to  S*  and  compute  a  new  model  on 
the  basis  of  this  larger  set. 

The  RANSAC  paradigm  contains  three  unspecified 
parameters:  (i)  the  error  tolerance  used  to 

determine  whether  or  not  a  point  is  compatible  with 
a  model,  (2)  the  number  of  suhsets  to  try,  and  (3) 
the  threshold  t,  which  is  the  number  of  compatible 
points  used  to  imply  that  the  correct  model  has 
been  found.  We  discuss  methods  for  computing 
rpasonabie  values  for  these  parameters  in  the 
following  subsections. 

ror  Tolerance  For  Establishing  Da  turn /Model 

Compatibility 

The  deviation  of  a  datum  from  a  model  is  a 
function  of  the  error  associated  with  the  datum  and 
the  error  associated  with  the  mcdel  (which,  in 
part,  is  a  function  of  tile  errors  associated  with 
the  data  used  to  instantiate  the  model).  If  the 
model  is  a  simple  function  of  the  data  points,  it 
may  be  practical  to  establish  reasonable  bounds  on 
error  tolerance  analyt ical ly .  Howiver,  this 
straightforward  approach  is  often  unworkable:  for 
such  cases  it  is  generally  possible  to  estimate 
bounds  on  error  tolerance  experimentally.  Sample 
deviations  can  be  produced  by  perturbing  the  data, 
computing  the  model,  and  measuring  the  implied 
errors.  The  error  tolerance  could  then  be  set  at 
one  or  two  standard  deviations  beyond  the  measured 
average  error . 

The  expected  deviation  of  datum  from  an 
assumed  model  is  generally  a  function  of  the  datum 
and,  therefore,  the  error  tolerance  should  be 
different  for  each  datum.  However,  the  variation 
in  error  tolerances  is  usually  relatively  small 
compared  to  the  size  of  a  gross  error.  Thus,  a 
single  error  tolerance  for  <.11  data  is  often 
sufficient . 

Maxijnum  Number  of  Attempts  to  Find  a  Consensus 
Set  - - - 

The  decision  to  stop  selecting  new  suusets  of 
P  can  bs  based  upon  the  expected  number  of  trials  k 


A 


73 


lequiied  to  select  a  subset  of  n  good  data  points. 
Let  w  be  the  probability  that  any  selected  data 
point  ia  within  the  error  tolerance  of  the  model. 
Then  we  have: 

E(k)  =  b  +  2*(l-b)*b  +  3*(i-b)2  *b  ... 

*  i*(1-b)i-,*b  * 

E(k)  -  b*[ 1  +  2*a  *  ?*a2  ... 

+  i*ai_t  -+...] 

where-  E(k)  is  the  expected  value  of  k,  b  *  wn, 
a  nd  a  *  ( 1  -b  ) . 

An  identity  for  the  cum  of  a  geometric  seiies  is: 
a/( 1 -a )  =  a  +  a  +  a^...+a:'L+... 


Differentiating  the  above  identity  with 
respect  to  a,  we  have: 

1/0 -a)2  =  1  +  2*a  +  3*a2  ...  +  i^a1**1  +  ... 
Thus : 


thus; 

E(k2)  *  (2-b)/(b2) 

and: 

SD( k )  »  aqi  t  ( 1  -  wri )  ]*(  1 /wn) 

Ve  note  that  generally  SD(k)  will  be 
approximately  equal  to  E(k):  thus,  for  example,  if 
(w  *  .5)  and  (n  4),  then  E(k)  *  16  and 
SD(k^  *  15-5.  This  means  that  we  might  want  to  try 
two  or  three  times  the  expected  number  of  random 
selections  implied  by  k  (as  tabulated  above)  to 
obtain  a  consensus  set  of  more  than  t  membera. 

From  a  slightly  different  point  of  view,  if  we 
want  to  ensure  with  probability  z  that  at  least  one 
oi  our  random  selections  is  an  error-free  set  of  n 
data  points,  then  we  must  expect  to  make  at  least  k 
selections  (n  data  points  per  selection),  where: 

0-b)k  *  ( 1  —  z ) 


E(k)  *  1/b  =  w’n  k  =  [log( 1 -z) ]/[log(l -b )] 


,The  following  ia  a  tabulation  of  some  values 
of  E^k)  for  corresponding  values  of  n  and  w: 

w  n  =  1  2  3  4  5  6 

•9  1.1  1.2  1.4  1.5  1.7  1.9 

•8  1-3  1.6  2.0  2.4  3.0  3.8 

1*4  2.0  2.9  4.2  5-9  8.5 

■6  1-7  2.H  4.6  7.7  13  21 

.5  2.0  4.0  8.0  16  32  64 

•4  2.5  6.3  16  39  98  244 

•3  3.3  11  37  123  412 

-2  5.0  25  125  625 

In  general,  we  would  probably  want  to  exceed 
E(k)  trials  by  one  or  two  standard  deviations 
before  wo  give  up.  Ve  note  that  the  standard 
deviation  of  k,  SD(k),  is  given  by: 

SD(k)  =*sqrt[E(k2)  -  E(k)2] 

Then: 

E(k2)  *  SIGMA(i):  {b*i2*ai-1 } 

-  SIGMA(i):  jb*i*(i-1 )*ai_1  ) 

-  SIGMA(i):  {b*i*ai~1 } 

but  (using  the  geometric  series  identity  and  two 
differentiations): 

2a  ( 1 -a)5  = 

SIGMA(i):  |i*(i-1 )*ai_1 } 


For  example,  if  (w  -  .5)  and  (n  •  4),  then 
(b  =  1/1 6).  To  obtain  a  ^0%  assurance  of  making  at 
least  one  error-free  aelection, 

k  «  log(.1 )/log(l5/l6)  *35.7 

A  ljQvei  Bound  On  the  Size  of  an  Acceptable 
Consensus  Set 

The  threshold  t,  an  unspecified  parameter  in 
the  formal  statement  of  the  RANSAC  paradigm,  is 
used  as  the  basis  for  determining  that  an  n-subset 
of  P  has  been  found  that  implies  a  sufficiently 
large  consensus  set  to  permit  the  algorithm  to 
terminate.  Thus,  t  must  be  chosen  large  enough  to 
satisfy  two  purposes:  that  the  correct  model  has 
been  found  for  the  data;  and  that  a  sufficient 
number  of  mutually  consistent  points  have  been 
found  to  satisfy  the  needs  of  the  final  smoothing 
procedure  (which  computes  improved  estimates  for 
the  model  parameters). 

To  ensure  against  the  possibility  of  the  final 
consensus,,  set  being  compatible  with  an  incorrect 
model,  and  assuming  that  y  ia  the  probability  that 
any  given  data  point  is  within  the  error  tolerance 
of  an  incorrect  model,  we  would  like  y^”n  to  be 
very  small.  While  there  ia  no  general  way  of 
precisely  determining  y,  it  is  certainly  reasonable 
to  assume  that  it  is  less  than  w  (w  is  the  a  priori 
probability  that  a  given  data  point  is  within  the 
error  tolerance  of  the  correct  model).  Assuming 
y  <  .5,  a  value  of  t-n  equal  to  5  will  provide  a 
better  than  95^  probability  that  compatibility  with 
an  incorrect  model  will  not  occur. 

To  satisfy  the  needs  of  the  final  smoothing 
procedure,  the  particular  procedure  to  be  employed 
must  be  specified;  if  leaat-squares  smoothing  is  to 


be  used  ,  there  are  many  situations  where  formal 
methods  can  be  invoiced  to  determine  the  number  of 
points  required  to  produce  a  desired  precision 
(e.g. .  see  Horenson  [l°7o]). 

Example 

Let  us  apply  RANSAC  to  the  example  described 
in  Figure  1.  A  value  of  w  (the  probability  that 
any  selected  data  point  is  within  the  error 
tolerance  of  the  model)  equal  to  .RR  is  consistent 
with  the  data,  and  a  tolerance  (t0  establish 
datum  model  compatibility)  of  .«  units  was  supplied 
as  part  cf  the  problem  statement.  We  will  accept 
the  RAN? A  -supplied  model  without  external 
smoothing  of  the  final  consensus  set:  thus,  we 
would  like  to  obtain  a  consensus  set  of  at  least 
seven  data  points  before  terminating  our  search. 
Since  there  are  only  seven  data  points  total  and 
one  of  these  points  is  a  gross  error,  it  is  obvious 
that  we  will  not  find  a  consensus  set  of  the 
desired  size,  and  so  we  will  terminate  with  the 
largest  set  we  are  able  to  find.  The  theory 
presented  earlier  indicates  that  if  we  take  two 
dats  points  at  a  time,  compute  the  line  through 
them  and  measure  the  deviations  of  the  remaining 
points  from  this  line,  we  should  expect  to  find  a 
suitable  consensus  set  within  two  or  three  trials; 
however,  because  of  the  limited  amount  of  data,  we 
might  be  willing  to  tr^,  all  71  combinations  to  find 
the  largest  consensus  set.  In  either  case,  we 
easily  find  the  consensus  set  containing  the  six 
valid  data  points  and  the  line  that  they  imply. 


THE  LOCATION  DETERMINATION  PROBLEM  (LDP) 

A  core  problem  in  image  analysis  is  that  of 
establishing  a  correspondence  between  the  elements 
of  two  representations  of  a  given  scene.  One 
variation  of  this  problem,  specially  important  in 
cartography,  is  to  determine  the  location  in  space 
from  which  an  image  or  photograph  was  obtained  by 
recognizing  a  set  of  landmarks  ("control  points") 
appearing  in  the  image  (this  is  variously  called 
the  problem  of  determining  the  elements  of  exterior 
camera  orientation,  or  the  earner  calibration 
problem,  or  the  image-to-data-base-cor respond ence 
problem).  It  is  routinely  solved  using  a  least- 
squares  technique  (e.g.  see  Wolf  [ 1 974]  or  Keller 
[  1 966 J )  with  a  human  operator  interactively 
establishing  the  association  between  image  points 
and  the  three-dimensional  coordinates  of  the 
corresponding  landmarks.  However,  in  a  fully 
automated  system  where  the  correspondences  must  be 
based  on  the  decisions  of  marginally  competent 
feature  detectors,  least  squares  is  often  incapable 
of  dealing  with  the  gross  errors  that  may  result: 
this  consideration,  discussed  at  length  in  the 
preceding  section,  ij  illustrated  for  the  Location 
Determination  Problem  in  an  example  to  be  presented 
later  (see  the  section  on  experimental  results). 

In  this  section  we  present  a  new  solution  to 
the  Location  Determination  Problem  (LDP)  based  on 
the  RANSAC  paradigm,  which  is  unique  in  its  ability 
to  tolerate  gross  errors  in  the  input  data.  We 


will  first  examine  the  conditions  under  which  a 
solution  to  the  LDP  is  possible  and  describe  new 
results  concerning  this  question;  we  then  present  a 
complete  description  of  the  RANSAC-based  algorithm, 
and  finally,  describe  experimental  results  obtained 
through  use  of  the  algorithm. 

We  formally  define  the  LDP  as  follows: 

Given  a  set  of  m  landmarks,  whose  ^-D  coordi¬ 
nates  are  known  in  some  coordinate  frame,  and 
given  an  image  in  which  some  subset  of  the  m 
landmarks  is  visible,  determine  the  location 
relative  to  the  coordinate  system  of  the 
landmarks  from  which  the  image  was  obtained. 

We  will  initially  assume  that  we  know  the 
correspondences  between  n  image  points  and 
landmarks:  later  we  consider  the  situation  in  which 
some  of  these  correspondences  are  invalid.  We  will 
also  assume  that  both  the  principal  point  in  the 
image  plane  (where  the  optical  axis  of  the  camera 
pierces  the  image  plane)  and  the  focal  length  of 
the  imaging  system  are  known;  thus  (see  Figure  2) 
we  can  easily  compute  the  angle  to  any  pair  of 
landmarks  from  the  Center  of  Perspective  (CP). 
Finally,  we  assume  that  the  camera  resides  outside 
and  "above"  a  convex  hull  enclosing  the  control 
points. 

We  will  later  demonstrate  (Appendix  A)  that  if 
we  can  compute  the  lengths  of  the  rays  from  the  CP 
to  three  of  the  landmarks,  then  we  can  directly 
solve  fox  the  location  of  the  CP  (and  the 
orientation  of  the  image  plane  if  desired).  Thus, 
an  equivalent,  but  mathematically  more  concise 
statement  of  the  LDP,  is: 

Given  the  relative  spatial  locations  of  n 
control  points,  and  given  the  angle  to  every 
pair  of  control  points  from  an  additional 
point  called  the  Center  of  Perspective  (CP), 
find  the  lengths  of  the  line  segments  ("legs") 
joining  the  CP  to  each  of  the  control  points. 

We  call  this  the  "perspective-n-point"  problem 
(FnP) . 

In  order  to  apply  the  RANSAC  paradigm,  we  wish 
to  determine  the  smallest  value  of  n  for  which  it 
is  possible  to  solve  the  PnP  problem. 

Solution  of  the  Perspective-N-Point  Problem 

The  PiP  problem  (n  -  1)  provides  no 
constraining  information,  and  thus  an  infinity  of 
solutions  is  possible.  The  P2P  problem  (n  =  2), 
illustrated  in  Figure  3,  also  admits  an  infinity  of 
solutions;  the  CP  can  reside  anywhere  on  a  circle 
of  diameter  Rab/s in(Oab) ,  rotated  in  space  about 
the  chord  (line)  joining  the  two  control  points  A 
and  B. 

The  P3P  problem  (n  *  3)  requires  that  we 
determine  the  lengths  of  the  three  legs  of  a 
tetrahedron,  given  the  base  dimensions  and  the  face 
angles  of  the  opposing  trihedral  angle  (see 
Figure  4).  The  solution  to  this  problem  is  implied 
by  the  three  equations  [a*]: 


75 


(Rab)2  =  a2  ♦  b2  -  ?*a*b*[Cos(Oab) 1 

(Rac)2  =  a2  +  c2  -  ?*a*c*[Cos(Oac) ]  [a*1 

(Rbc)2  =  b2  +  c2  -  ?*b*c*[Cos(Obc) ] 

It  is  known  that  n  independent  polynomial 
equations,  in  n  unknowns,  can  have  no  more 
solutions  than  the  product  of  their  respective 
decrees.  Thus,  the  system  A*  can  have  a  maximum  of 
eight  solutions.  However,  because  every  term  in 
the  system  A*  is  either  a  constant,  or  of  second 
degree,  for  every  real  positive  solution  there  is  a 
geometr ically  isomorphic  negative  solution.  Thus, 
there  are  at  most  four  positive  solutions  to  A*, 
and  in  Figure  6  we  show  an  example  demonstrating 
that  the  upper  bound  of  four  solutions  is 
attainable . 

In  Appendix  A  we  derive  an  explicit  algebraic 
solution  for  the  system  A*.  This  is  accomplished 
by  reducing  A*  to  a  biquadratic  (quartic) 
polynomial  in  one  unknown  representing  the  ratio  of 
two  leg3  of  the  tetrahedron,  and  then  directly 
solving  this  equation  (we  also  present  a  very 
simple  iterative  method  for  obtaining  the  solutions 
from  the  given  problem  data). 

For  the  case  n  =  4,  when  all  four  control 
points  lie  in  a  common  plane  (not  containing  the 
CP,  tnd  such  that  no  more  than  two  of  the  control 
points  lie  on  any  single  line),  we  provide  a 
tec  Aique ,  in  Appendix  B,  that  will  always  produce 
a  un-cjue  solution.  Surprisingly,  when  all  four 
c*  ol  points  do  not  li  in  the  same  plane,  a 
unique  solution  cannot  be  assured:  an 

t  ample,  presented  in  Figui  »  6.  snows  that  at  least 
two  solutions  are  possible  for  the  P4P  problem  with 
the  control  points  in  "general  position." 

To  solve  for  the  location  of  the  CP  in  the 
case  of  four  nonplanar  control  points,  wa  con  mvs'. 
the  algorithm  presented  in  Appendix  A  or-  two 
distinct  subsets  of  the  control  points  taken  three 
at  a  time*  the  solution(s)  common  to  both  subsets 
locate  the  CP  to  within  the  ambiguity  inherent  in 
the  given  information. 

The  approach  used  to  construct  the  example 
shown  in  Figure  6  can  be  extended  to  any  number  of 
additional  points.'  It  is  based  on  the  principal 
depicted  in  Figure  if  the  CP  and  any  number  of 
control  points  lie  on  the  same  circle,  then  the 
angle  between  any  pair  of  control  points  and  the  CP 
will  be  independent  of  the  location  on  the  circle 
of  the  CP  (and  hence  the  location  of  the  CP  cannot 
be  determined).  Thus,  we  ar«  able  to  construct  the 
example  shown  in  Figure  7,  in  which  five  control 
points  in  general  position  imply  two  solutions  to 
the  P5P  problem.  While  the  same  technique  will 
work  for  six  or  more  control  points,  four  or  more 
of  these  points  must  now  lie  in  same  plane  and  are 
thus  no  longer  in  general  position. 

To  prove  that  six  (or  more)  control  points  in 
general  position  will  always  produce  a  unique 


solution  to  the  P6F  problem,  we  note  tha^  for  this 
case  we  can  always  solve  for  th43  12  coefficients  of 
the  ^  x  i  matrix  T  that  specifies  the  mapping  in 
homogeneous  coor  3 inates '  •prom  three  space  to  wo 
space*  each  of  the  six  correspondences  provides 
three  new  equations  and  introduces  one  additional 
unknown  (the  homogeneous  coordinate  scale  factor). 
Thus,  for  six  cortrol  points,  we  have  18  linear 
equations  to  solve  for  the  18  unknowns  (actually, 
it  can  be  shown  vat .  at  most,  17  of  the  unknowns 
are  independent  .  Given  the  transformation 
matrix  m ,  we  can  construct  an  additional 
^synthetic)  control  point  lying  in  a  common  plane 
with  three  of  the  given  control  points  and  compute 
its  location  in  the  image  plane:  the  technique 
described  in  Appendix  B  can  now  be  used  to  find  a 
unique  solution. 


IMPLEMENTATION  DETAILS  AND  EXPERIMENTAL  RESULTS 

The  LANSAC/LD  Algorithm 

The  RANSAC/LD  algorithm  accepts  as  input  the 
following  data: 

(l)  A  list  L  of  m  6-tuples--eacb  6-tuple 

containing  the  3-D  spatial  c^or  linates  of 
a  control  point,  its  corresponding  2-D 
image  plane  coordinates,  and  an  optional 
number  giving  the  expected  error  (in 
pixels)  of  the  given  location  in  the 
image  plane. 

(?)  The  focal  length  of  the  i  maging  system 
and  the  image  plane  coordinates  of  the 
principal  point. 

(3)  The  probability  ( 1 -w)  that  a  6-tuple 
contains  a  gross  mismatch. 

(4)  A  "confidence"  number  G  which  is  used  to 
set  the  internal  thresholds  for 
acceptance  of  intermediate  results 
contributing  to  a  solution.  A  confidence 
number  of  one  forces  very  conservative 
behavior  on  the  algorithm;  a  confidence 
number  of  zero  will  call  almost  anything 
a  valid  solution. 

The  RANSAC/LD  algorithm  produces  as  output  the 
following  information: 

(l)  The  3-D  spatial  coordinates  of  the  lense 
center  (i.e.,  the  Center  of  Perspective), 
and  an  estimate  of  the  corresponding 
error. 

(?)  The  spatial  orientation  of  the  image 
plane. 

The  RANSAC/LD  algorithm  operates  as  follows: 

(1)  Three  6-tuples  are  selected  from  list  L 
by  a  quasi-random  method  that  ensures  a 
reasonable  spatial  distribution  for  the 
corresponding  control  points.  This 
initial  selection  is  called  SI. 

(2)  The  CP  (called  CPl)  corresponding  to 
selection  SI  is  determined  using  the 


76 


closed-form  solution  provided  in 
Appendix  A:  multiple  solutions  are 
treated  as  if  they  were  obtained  from 
separate  selections  in  the  following 
steps . 

("5)  The  error  in  the  derived  location  of  CPI 
is  estimated  by  perturbing  the  input 
coordinates  (either  by  the  amount 
specified  in  the  6 -tuples  or  by  a  default 
value  of  one  pixel)  and  recomputing  the 
effect  this  would  have  on  the  location  of 
the  CPI . 

(4)  Given  the  error  estimate  for  the  CPI,  we 
use  the  technique  described  in  Bolles 
r197B]  to  determine  error  ellipses 
(dimensions  based  upon  the  supplied 
confidence  number)  in  the  image  plane  for 
each  of  the  control  points  specified  in 
list  L ■  if  the  associated  image 
coordinates  reside  within  the 
corresponding  error  ellipse,  then  the  6- 
tuple  is  appended  to  the  consensus  set 
S1/CP1  . 

(5)  If  the  size  of  SI /C PI  equals  or  exceeds 
some  threshold  value  t  (nominally  equal 
to  a  value  between  7  and  mw) ,  then  the 
consensus  set  Si/CPI  is  supplied  to  a 
least-squares  routine  (see  Bolles  f 1 978] 
or  Gennery  [ 1 975 ] )  for  final 
determination  of  the  CP  location  and 
image  plane  orientation.  Otherwise,  the 
above  steps  are  repeated  with  a  new 
random  selection  S2,  S^.  ... 

(6)  If  the  number  of  iterations  of  the  above 
steps  exceeds  k  =  [log( 1 -0) ]/[log( 1 -w^) ] , 
then  the  largest  consensus  set  found  so 
far  is  used  to  compute  the  final  solution 
(or  we  terminate  in  failure  if  this 
largest  consensus  set  contains  fewer  than 
six  members) . 

Exper imental  Results 

To  demonstrate  the  validity  of  our  theoretical 
results,  we  performed  three  experiments.  In  the 
first  experiment  we  found  a  specific  Location 
Determination  Problem  in  which  the  common  least- 
squares  pruning  heuristic  failed,  and  showed  that 
RANSAC  successfully  solved  this  problem.  In  the 
second  experiment,  we  applied  RANSAC  to  fifty 
synthetic  problems  in  order  to  check  the 
reliability  of  the  approach  over  a  wide  range  of 
parameter  values.  In  the  third  experiment  we  used 
standard  feature  detection  techniques  to  locate 
landmarks  in  an  aerial  image  and  then  used  RANSAC 
to  determine  the  position  and  orientation  of  the 
camera . 

A  Location  Determination  Problem  Example  of  a 
Least-Squares  Pruning  Error 


An  alternative  to  least  squares  would  be  to 
average  the  parameters  computed  from  random  triples 
in  the  consensus  set  that  fall  within  (say)  the 
center  5C$  of  the  associated  histogram. 


The  LDP  in  this  experiment  was  based  upon  ?0 
landmarks  and  their  locations  in  an  image.  Five  of 
the  twenty  correspondences  were  gross  errors*  that 
is.  their  given  locations  in  the  image  were  further 
than  10  pixels  from  their  actual  locations.  The 
image  locations  for  the  '‘good”  correspondences  were 
normally  distributed  about  their  actual  locations 
with  a  standard  deviation  of  one  pixel. 

The  heuristic  to  prune  gross  errors  was  the 
following : 

*  Use  all  of  the  cor  respondences  to 
instantiate  a  model. 

*  On  the  basis  of  that  model,  delete  the 
correspondence  that  has  the  largest 
deviation  from  its  predicted  image 
location. 

*  Instantiate  a  new  model  without  that 
cor  respondence. 

*  If  the  new  model  implies  a  normalized  error 
for  the  deleted  correspondence  that  is 
larger  than  three  standard  deviations 
assume  that  it  is  a  gross  error,  leave  it 
out,  and  continue  deleting  correspondences. 
Otherwise,  assume  that  it  is  a  gOo 1 
correspondence  and  return  the  model  that 
included  it  as  the  solution  to  the  problem. 

This  heuristic  successfully  deleted  two  of  the 
gross  errors;  but  after  deleting  a  third,  it 
decided  that  the  new  model  did  not  imply  a 
significantly  large  error,  so  it  returned  a 
solution  based  upon  eighteen  correspondences,  three 
of  which  were  gross  errors. 


In  this  experiment  we  applied  RANSAC  to  fifty 
synthetic  LDPs .  Each  problem  was  based  upon  thirty 
landmark-to-image  correspondences.  A  range  of 
probabilities  were  used  to  determine  the  number  of 
gross  errors  in  the  problems:  the  image  location  of 
a  gross  error  was  at  least  10  pixels  from  its 
actual  location.  The  location  of  a  good 
correspondence  was  distributed  about  its  actual 
location  with  a  normal  distribution  having  a  ^ 

standard  deviation  of  one  pixel.  Two  different 
camera  positions  were  used--one  looking  straight 
down  on  the  landmarks  and  one  looking  at  them  from 
an  oblique  angle.  The  RANSAC  algorithm  described 
earlier  in  this  section  was  applied  to  these 
problems;  however,  the  simple  iterative  technique 
described  in  Appendix  A  wa3  used  to  locate 
solutions  to  the  P3P  problems  in  place  of  the 
closed  form  method  also  described  in  that  appendix, 
and  a  second  least-squares  fit  was  used  to  extend 
the  final  consensus  set  (as  suggested  in  second 
section  of  this  paper).  Table  1  summarizes  the 
results  for  ten  typical  problems  (RANSAC 
successfully  avoided  including  a  gross  error  in  its 


When  RANSAC  was  applied  to  this  problem,  it 
located  the  correct  solution  on  the  second  triple 
of  selected  points.  The  final  consensus  set 
contained  all  of  the  good  correspondences  and  none 
of  the  gross  errors. 

Fifty  Synthetic  Location  Determination  Problems 


77 


final  consensus  set  in  all  of  the  problems):  in 
five  of  these  problems  the  probability  of  a  good 
correspondence  was  0.8  and  in  the  other  five 
problems  it  was  0.6.  The  execution  time  for  the 
current  program  is  approximately  one  second  for 
each  camera  position  considered. 


No.  of 
Good 

Corresp. 

No.  of 

Cor  r  esp . 
in  Final 
Consensus 
Set 

No.  of 

Tr  iples 
Considered 

No  .  of 
Camera 

Po  si tions 
Considered 

w  - 

.8 

22 

19 

6 

10 

2^ 

27 

1 

3 

f  9 

19 

2 

3 

25 

25 

1 

2 

24 

23 

3 

8 

w  3 

.6 

21 

20 

1 1 

21 

17 

17 

1 

1 

17 

16 

6 

8 

18 

16 

9 

21 

21 

18 

9 

15 

TABLE  1 


A  MRealH  Location  Determination  Problem 

Cross-correlation  was  used  to  locate  25 
landmarks  in  an  aerial  image  taken  from 
approximately  4.000  feet  with  a  6-inch  lens.  The 
image  was  digitized  on  a  grid  of  2,000  by  2,000 
pixels  which  implies  a  ground  resolution  of 
approximately  two  feet  per  pixel.  Three  gross 
errors  were  made  by  the  correlation  feature 
detector.  When  RANSAC  was  applied  to  this  problem, 
it  located  a  consensus  set  of  17  on  the  first 
triple  selected  and  then  extended  that  set  to 
include  all  22  good  correspondences  after  the 
initial  least-squares  fit.  The  final  standard 
deviations  about  the  camera  parameters  were  as 
follows • 

X:  0.1  feet  Heading;  .01  degrees 

Y;  6.4  feet  Pitch:  .10  degrees 

Z:  1  feet  Roll;  .12  degrees 


CONCLUDING  COMMENTS 

In  this  paper  we  have  introduced  a  new 
paradigm,  Random  Sample  Consensus  (RANSAC),  for 
fitting  a  model  to  experimental  data.  RANSAC  is 
capable  of  interpreting/smoothing  data  containing  a 
significant  percentage  of  gross  errors,  and  thus  is 
ideally  suited  for  applications  in  automated  image 


analysis  where  interpretation  xs  based  on  the  data 
provided  by  error-prone  feature  detectors. 

A  major  portion  of  this  paper  describes  the 
application  of  RANSAC  to  the  Location  Determination 
Problem  (LDP):  given  an  image  depicting  a  set  of 
landmarks  with  known  locations,  determine  that 
point  in  space  from  which  the  image  was  obtained. 
Most  of  the  results  we  present  concerning  solution 
techniques  and  the  geometry  of  the  LDP  problem  are 
either  new  or  not  generally  known.  The  current 
pho t og r amme t r ic  literature  offers  no  analytic 
solution,  other  than  variants  of  least  squares  and 
the  '1hurch  method,  for  solving  the  perspeot ive-n- 
point  problems.  The  Church  method,  which  provides 
an  iterative  solution  for  the  P3P  problem,  is 
presented  (see  Church  [ 1 94S ]  or  Wolf  [ 1 974 ] ) 
without  any  indication  that  more  than  one 
physically  real  solution  is  possible;  there  is 
certainly  no  indication  that  anyone  realizes  that 
physically  real  multiple  solutions  are  possible  for 
more  than  three  control  points  in  general  position. 
(It  should  be  noted  that  because  the  multiple 
solutions  can  be  arbitrarily  close  together,  even 
when  an  iterative  technique  is  initialized  to  a 
value  close  to  the  correct  solution,  there  is  no 
assurance  that  it  will  converge  to  the  desired 
value) . 

In  the  section  on  the  LDP  problem  (and 
associated  appendices)  we  have  completely 
characterized  the  P?P  problem  and  provided  a 
closed-form  solution.  We  have  shown  that  multiple 
physically  real  solutions  can  exist  for  the  P4P  and 
PSP  problems,  but  also  demonstrated  that  a  unique 
solution  is  assured  when  four  of  the  control  points 
reside  on  a  common  plane  (solution  techniques  are 
provided  for  each  of  these  cases).  Thp  issue  of 
determining  the  maximum  number  of  solutions 
possible  for  the  P4P  and  P5P  problems  remains  open, 
but  we  have  shown  that  a  unique  solution  exists  for 
the  P6P  problem  when  the  control  points  are  in 
general  position. 


REFERENCES 


1.  R.  C.  Bo  lies ,  L.  H.  Quam,  M.  A.  Fischler ,  and 
H.  C.  Wolf,  "Automatic  Refinement  Of  Image-To- 
Data  Base  Correspondences  "  in  Proceedings; 
Jjnage  Understanding  Workshop,  (November  T~978) . 

E.  Church,  Revised  Geometry  of  the  Aerial 
Photograph  -  Bulletin  of  Aerial  Pho togramme try 
No.  15,"  Syracuse  University  ( 1 945 ) . 

3-  S.  D.  Conte,  El ementary  Numerical  Analysis 
(McGraw  Hill,'  1965).  - 

4.  E.  Dehn,  Algebraic  Equations  (Dover,  I960). 

5-  R.  0.  Duda  and  P.  E.  Hart,  Pattern 

Classification  and  Scene  Ana  lysls^"(  Wiley - 
Interscience,  1973). 


$ 

I 


78 


7. 


JO. 


1  1 


Calibration!?,SWordSAU?^-  Stere°-Camera 
Project  internal  memo  ntellieence 

M.  Keller  and  G.  C.  TewinKei  “c 

in  Photogrammetrv, "  ESSA  Techni  !  Section 

•?p  if  q  n  ,  “  ^  technical  Report 

32,  U.S.  Coast  and  Geodetic  Survey  (1966). 

Flements^for  ^  A‘  AdamS’  -M4thema tical 
7976*.'  —  ^r.aphics^WeGraw'  Hill. 

f  romGauss  ^to  "Kalman  S"~IPEprsp6Qt  tirna^on‘ 

1970).  ill™  Spectrum  (July 

Huf;  ’m>r~ 2£  ag“«— am  (»cc,.. 

*  iateaa 


Appendix  A 

AN  ANALYTIC  SOLUTION  FOR  TH" 

PKRSPECTI VE-3-P0INT  PROBLEM 

,„,bn,2  Sit  5S’  ofJhi-  p*"'r.  ■■  "*•» 

r«..  ,oTtL  n,»p hm  “  ”•* 

:issi™  s™  » “ 
s .tTuiirir'r'i"  - 

s:  sras'irs/r 

the  CP).  vlTXt Volte  UTlltT  ^  ^ 

3-D  reference  frame  in  whfch  L  resPect  to  the 

were  originally  specified;  and  final ly^eo01"*8 
orientation  of  the  i»„„  ,  Ilnaliy,  compute  the 
reference  frame  g  Plane  With  resPect  to  the 

FigufeTp  ~  ~  £iIiP££trve  Tetrahedron  (see 
tase  --  -  -e 

an^rSZScV"?!^  Set  •gX‘^ 

remaining  sides  of  the  tetrahedro"n [  ( a  ^ 

equations  y  oivxng  the  system  of 

DO  (R.b)2  ,  .  „2  .  ?Wcos(0tt) 

t»2j  <».c)2  ■  .2  .  c2  -  !W.„(0.„) 

[A!]  (St.P  -  b*  .  ,2  .  2»h*o*coa(Obc, 


rA4P  Let  b  =  x->a  and  c  ,  y#a 
[AS]  (Rac)2  . 

a?  *  (y2)*(a2)  -  2*(a2)*y*cos(Oac) 

A6]  (Rab)^  = 

a:  *  (*2)*(a2)  -  2*(a>x*cos(0ab) 

A  (Rbc)2  =  (x2)*(a2)  t  (y2)*(a2) 

-  2*(a^)*x*y*cos (Obc ) 

from  [A5.1  and  [a7] 

A^]  ((Rbc)^)*[  ]  +  (v2>  p*  *  /  . 

from  [A6]  and  [A7] 

'«]  ((Hbo)2)«[  ,  .  (.2,  .  2***copfOab)  J  . 

<(“*b’  ’*  *  <J2>  -  2*<*j*co,(0bc)  J 

U'O)  Let  -i°L  ,  K,  .  (m«>2 

(.to)2  "22 


(Rab)2 


From  [A8]  and  [A9 ] 


We  now  proceed  as  follows: 


AH  ]  0  =  (y2)*[i  _K)  ] 

I  [*x*)ijrS(0aC)'**CO3  (0bo)] 

From  A9]  and  [aio] 

[a12]  *  t((£L*  2*f[-x*co^0bc)i 

(x  )  ( 1 -K2 )  ♦  2*x*K2*cos (Oa b)  -  K2] 

Equations  [All]  and  [A12]  have  the  form: 

1  A13]  0  -  m*(y2)  +  p*y  +  q 

[AH]  0  -  m’*(y2)  +  p,*y  t  q. 

S-Tis  s&aji'41 11  ■'  *M  - 

[A15]  0  =  [P*m'  -  p’*m]*y  +  [m.*q  _ 

sv:  — 

0  -  [m'*q  -  m*q '  ]*(y2)  +  [p-*q  .  p,q ,  ]#y 
[a  1 6 ]  0  =  [m’*q  .  m*q']*y  +  [p.*q  .  p*q.j 

Assuming  m’*q  is  not  equal  (#)  to  m*q' ,  that  is 


79 


x  )-Kl]  U  (x2)*(i-fo)* 

+  2*X*K2*( 1 -K1 )*cos(Oah)  -(??}, ).K2] 

[AM]fA,5]  ^  fA16]  are  ^“ivalent  to  [am]  and 

piyTAief^S*!'’!  and  ■»!«- 

y  P  P  m  •  and  subtract  to  obtain: 
A17]  0  =  (m’*q  -  m*q')2 

-  rp*m '  -  p '  *m]*[p'  *q  _p*q.J 

-■ --  * 

A1^]  0  =  G4*(x4)  +  G3*(x^) 

+  *  *  ,  x*  )  +  G?#(x)  +  GO 

where 

rA19]  G4  «  (K1 *K2  -  K1  -  K2)2 

-  4*K 1*K2*[cos(0bc)2] 

rA20]  C3  -  4*[K1*K2-K1-K2]*K2*(l-Kl)*Cn  tn  k: 

:  SSSTOSir  -J!,~ 

Al1  G2  2*K2*( 1_K1  )Cos(0ab)]2 

+  fJ?irp+K1*pHK  1*K2-K1-K2] 

4*K1*[(K1-K2)*(Pos(0bc)2) 

+  (l-K2)*Kl*(Cos(0ac)2) 

-  2*K2*(l  +K1 )*Cos(0ab)*Cos(0ac)*Cos(0bc) ] 

A  ”1  Cl  =  4*(K1*K2+K1-K2)*K2*0-lfn*rn  fr>  v\ 

rA23]  GO  -  (K1*K2+K1-K2)2 

-  4*(K12)»K2*(Cos(0ac)2) 
Deh^8ri>q60])3^orabybiteOUt?  °l0Sed  form  (see 

[1965]).  7  lterative  techniques  (see  Conte 

.Po«S  £“£  ■»«  « [*.8],  „  a.te[,lre 

•*■  ■*«  -s” irjizhTL::.  -»  °f  «■ 


A  y  =  Cos(Oac) 


[24] 


a  =  -. 


Rab 


SORT  [(x2)  .  2*x*Cos(0ab)  Z~] 
and  from  [a  14]  we  obtain: 

[A25]  b 


a*x 


If  ra  *q  #  m*q',  then  from  [A16]  we  have: 
[A26]  y 


p’*q  -  p*q  ' 


*-  SORT  [ (Cos(Oac) )2  *  '  'a  '  ■ 

(a2)'"" 

vor  each  real  positive  val  in  nr 

value  of  V  from  [aa]:  y’  we  obtain  « 


[A28] 


y* 


San  [S]8  the^rpsul tinr^ne?  ?*  ^ 

they  must  be  shown  tp  be  invaHd 

accepted.  ^  A  before  they  are 

t»ie]Io,.f"”elb;a“*;a';->  »•■>■«»  ..=*  of 

“  •*»  »"•  «-•»  i« «.  ™L  b)dT»f  “ch 
isnss  “  “,‘ku”'  >»•  ppp..  bo.n“,i  ?0t-  “ 

Example 

Rab  -  Rac  =  Rbc  =  2*S0RT(3) 
os(Oab)  =  Cos(Oac)  =  Cos(Obc)  = 

“  O  ' 


(s  )  +  (b2)  -  (Rab)' 
2*a*b 


20 

32 


through U[A23]!hwee0b tain Sthet0  ^uations  [A19] 

biquadratic  defined  in  [ais]:  °efflcients  of  the 

-5625, 3- 51 5625, -5. 90625, 3. 515625, -.5625] 

The  roots  of  the  above  equation  are: 

[M,4f0.25] 

For  each  root  we  have: 


ROOT 

1 

1 

4 

•25 


4 

4 

1 

4 


•25 


m*q‘  -  m’*q 


-f-° I  ihe  Perspective 
jetrahedron  (see  Figure  8) - K — — 

-,A  simpl“  way  to  locate  solutions  to  P^P 

"hlCh  18  »■  adequate  aub.tttut, 


j 


■ 


& 


80 


for  the  more  involved  procedure  described  in  the 
preceding  subsection,  is  to  slide  one  vertex  of  the 
control-point  triangle  down  its  leg  of  the 
tetrahedron  and  look  for  positions  of  the  triangle 
that  permit  the  other  two  vertices  to  lie  on  their 
respective  legs.  If  vertex  A  is  at  a  distance  "a" 
from  L  (L  is  the  center  of  perspective),  fhe 
lengths  of  the  sides  A3  and  AC  restrict  the 
triangle  to  four  possible  positions.  Given  the 
angle  between  legs  LA  and  LB,  compute  the  distance 
of  point  A  from  the  line  LB  and  then  compute  points 
B1  and  B2  on  LB  that  are  at  the  proper  distance 
from  A  to  insert  a  line  segment  of  length  AB. 
Similarly,  we  compute  (at  most)  two  locations  for  C 
on  its  leg.  Thus,  given  a  position  for  A,  we  have 
found  (at  most)  four  positions  for  a  triangle  that 
has  one  side  of  length  AB  and  one  of  length  AC. 

The  lengths  of  the  third  sides  (BC)  of  the  four 
triangles  vary  (non-linear ly)  as  point  A  is  moved 
down  its  leg  Solutions  to  the  problem  can  be 
obtained  by  iteratively  repositioning  A  to  imply  a 
third  side  of  the  required  length. 

Computing  the  3-D  Location  o_f  the  Center  of 
Perspective  (see  Figure  9) 

Given  the  three-dimensional  locations  of  the 
three  control  points  of  a  perspective  tetrahedron, 
and  the  lengths  of  the  three  legs,  the  3-D  location 
of  the  center  of  perspective  can  be  computed  as 
follows : 

(1)  Construct  a  plpne  PI  that  is  normal  to  AB 
and  passes  through  the  center  of 
perspective,  L.  This  plane  can  be 
constructed  without  knowing  the  position 
of  L,  which  is  what  we  are  trying  to 
compute.  Consider  the  face  of  the 
tetrahedron  that  contains  vertices  A,  B, 
and  L.  Knowing  the  lengths  of  sides  LA, 
LB  and  AB,  we  can  use  the  law  of  cosines 
to  find  the  angle  LAB,  and  then  the 
projection  QA  of  LA  on  AB.  (Note  that 
angle  LQA  is  a  right  angle,  and  the  point 
Q  is  that  point  on  line  AB  that  is 
closest  to  L) .  Construct  a  plane  normal 
to  AB  passing  through  Q;  this  plane  also 
passes  through  L. 

(2)  Similarly  construct  a  plane  P2  that  is 
normal  to  AC  and  passes  through  L. 

(3)  Construct  the  plane  P3  defined  by  the 
three  points  A,  B,  and  C. 

(4)  Intersect  planes  PI ,  P2,  and  P3-  By 
construction,  the  point  of  intersection  R 
is  the  point  on  P3  that  is  closest  to  L. 

(5)  Compute  the  length  of  the  line  AR  and  use 
that  in  conjunction  with  the  length  of  LA 
to  compute  the  length  of  the  line  RL, 
which  is  the  distance  of  1,  from  the  plane 
P3. 

(6)  Compute  the  cross  product  of  vectors  AB 
and  AC  to  form  a  vector  perpendicular  to 
P3  Then  scale  that  vector  by  the  length 
of  RL  and  add  it  to  R  to  get  the  3-D 
location  of  the  center  of  perspective  L. 


If  thp  focal  length  of  the  camera  and  the 
principal  point  in  the  image  plane  are  known,  it  is 
possible  to  compute  the  orientation  of  the  image 
plane  with  respect  to  the  world  coordinate  system; 
that  is,  the  location  of  the  origin  and  the 
orientation  of  the  image  plane  coordinate  system 
with  respect  to  the  3-D  reference  frame.  This  can 
be  done  as  follows: 

(1)  Compute  the  3-D  reference  frame 
coordinates  of  the  center  of  perspective 
(as  described  above). 

(2)  Compute  the  3-D  image  locations  for  the 
three  control  points:  Since  we  know  the 
3-D  coordinates  of  the  CP  and  control 
points,  we  can  compute  the  3-D 
coordinates  of  the  three  rays  between  the 
CP  and  the  control  points.  Knowing  the 
focal  length  of  the  imaging  system,  we 
can  compute,  and  subtract  from  each  ray,  & 
the  distance  from  the  CP  to  the  image 
plane  along  the  ray. 

(3)  Compute  the  equation  of  the  plane 
containing  the  image  using  three  of  the 
image  points  found  in  3tep  (2).  The 
normal  to  this  plane,  passing  through  the 
CP,  gives  us  the  origin  of  the  image 
plane  coordinate  system  (i.e.,  the  3-D 
location  of  the  principal  point),  and  the 
Z  axis  of  this  system. 

(4)  The  3-D  orientation  of  the  image  plane  X 
and  Y  axis  can  now  be  obtained  by 
computing  the  3-D  coordinates  of  a  vector 
from  the  principal  point  to  any  of  images 
of  the  points  found  in  (2). 


Appendix  B 

AN  ANALYTIC  SOLUTION  FOR  THE  PERSPECTIVE -4-POINT 
PROBLEM  (with  all  control 
points  lying  in  a  common  plane) 

,  In  this  appendix,  we  present  an  analytic 
technique  for  obtaining  a  unique  solution  to  the 
P4P  problem,  when,  the  four  given  control  points  all 
lie  in  a  common  plane: 

Problem  Statement  (see  Figure  1 0) 

GIVEN:  a  correspondence  between  four  points  lying 
in  a  plane  in  3-D  space  (called  the  object  plane), 
and  four  pointa  lying  in  a  diatinct  plane  (called 
the  image  plane);  and  given  the  distance  between 
the  center  of  perspective  and  the  image  plane 
(i.e.,  the  focal  length  of  the  imaging  system);  and 
also  given  the  principal  point  in  the  image  plane 
(i.e.,  the  location,  in  image  plane  coordinates,  of 
the  point  at  which  the  optical  axis  of  the  lense 
pierces  the  image  plane). 

FIND:  the  3-D  location  of  the  Center  of  Perspective 
relative  to  the  coordinate  system  of  the  object 
plane. 


81 


Notation 


Det  the  four  given  image  points  be  labeled 
jpi),  and  the  four  corresponding  object 
points  ( Qi } . 


*  We  will  assume  that  the  ?~D  Image  Plane 
coordinate  system  has  its  origin  at  the 
principal  p->int  (PPl). 

*  We  will  assume  that  the  Object  Plane  has 
the  equation  Z  =  0  in  the  reference 
coordinate  aystem  Standard  techniques  are 
available  to  transform  from  this  coordinate 
system  into  a  ground  reference  frame  (e.g.  t 
see  Duda  [ 1 973 J  or  Rogers  [l976]). 

*  Homogeneous  coordinates  will  be  assumed 
(e.g.,  see  Wylie  [ 1 970] ) . 

*  Primed  symbols  represent  transposed 
structures . 


Solution  Procedure 


a) 


Compute  the  3x3  collineation  matrix  T  which 
mapa  points  from.  Object  Plane  to  Image  Plane 
(a  procedure  for  computing  T  is  given  later): 


(1)  [Pi]  =  [T]*[Qi] 

where 

= 

1 

[Qi] 

ki*xi , ki*yi , ki ] ' 
Xi  ,  Yi  ,  1  ]  * 


The  ideal  line  in  the  Object  Plane,  with 
coordinates  [o.O  l]',  is  mapped  into  the 
vanishing  line  in  the  Image  Plane  [VLl]  by 
the  transformation: 


(2)  [VLl]  =  [inv[T]]'*[0,0,l] ' 


c) 


Determine  the  distance  DI  from  the  origin  of 
the  Image  Plane  (PPl)  to  the  vanishing  line 
[VLl]  -  [al  ,82,33]’: 


(3) 


DI 


a3 


sqit[  (al  )2  +  (a2)2)] 


d) 


Solve  for  the  dihedral  (tilt)  angle  0  between 
the  Image  and  Object  planes: 


(A) 


0  *  arctan( - ) 

DI 


where  f  =  focal  length 


e) 


The  ideal  line  in  the  Image  Plane  with 
coordinates  [ 0 , 0 , 1 ] *  is  mapped  into  the 
vanishing  line  in  the  Object  Plane  [VLo]  by 
the  transform: 


(5)  [VLO]  =  [t],*[o,0,1]’ 


A 

- T~ 


f)  Compute  the  location  of  point  [ppo]  in  the 
Object  Plane  (fppo]  is  the  point  at  which 
the  optical  axis  of  the  lense  pierces  the 
object  plane)  : 


(6)  [PPO]  -  [ inv[ T] ]  **[o  ,0. 1  ]’ 


g)  Compute  the  distance  DO  from  [pPo]  = 


cl , c2 , c3 
bl  .  b2 ,  b3 


to  the  vanishing  line  [VLO] 
in  the  Object  Plane-. 


(7) 


DO  = 


b1*c1  +  b2*c2  *  b3*c3 


c3*aqrt[ (bl )2  *  (b2)2] 


h)  Solve  for  the  "pan"  angle  $  as  the  angle 

between  the  normal  to  'VLOj  =  [b1,b2.b3]'  and 
the  X  axis  in  the  Object  Plane: 


-b2 

(8)  $  =  arctan(  - 

bl 


Determine  XSGN  and  YSGN: 


If  a  line  (parallel  to  the  X  axis  in  the 
object  plane)  through  [PPO]  intersects  [VLo] 
to  the  right  of  [PPO],  then  XSGN  =  1  else 
XSGN  =  -1.  Thus 


(9) 


if 


b1*d  +  b2*c2  +  b3*c3 


<  0 


b1*c3 

then  XSGN  =  1  else  XSGN  *  -1 
Similarly, 


b1*c1  +  b2*c2  +  b3*c3 

(10)  if  - - -  < 

b2*c3 


then  YSGN  =  1  elae  YSGN  =  -1 


j)  Solve  for  the  location  of  the  CP  in  the  object 
plane  coordinate  system: 


(11)  DCP  =  D0*ain(0) 


(12)  XCP  -  XSCN*abs[DCP*Sin(0)*Coa($)] 
+  c1/c3 


(13)  YCP  =  YSGN*abs[DCP*Sin(0)*Sin($)] 
+  c2/c3 


(14)  ZCP  *  DCP*cos(0) 


Note: 


If  [VLl],  as  deterained  in  (b),  has  the 
coordinates  [o,0,lc],  then  the  image  and 
object  planea  are  parallel  (0  =  0).  Rather 
than  continuing  with  the  above  procedure, 


76 


a 


J 


we  now  solve  for  the  desired  information 
using  sirrn  1  ar  triangles  and  Euclidean 
geometry. 


[inv[T]  ]  *  = 


1117.14  -2038.86  0.0 

3371.56  2302.22  -5. 14991 

-51.0636  -120.442  1.31713 


Computing  the  Coll i neation  Matrix  T 
Let : 

1  1  ^  j  Y 1  1  1  1 

[Q]  =  !!  X2  Y2  1  !  i  =  [[Q1]\[Q2]\[Q3]’] 

! !  X3  Y3  1  ! ! 

! !  xi  yi  1  ! ! 

rp]  =  ! 1  x2  y2  1  | |  =  [ [PI ] ’ » [ P2j ' ,  [ P3 ]  '  ] 

! !  X3  y3  1  !  ! 

[ Q4 1  =  [X4, Y4, 1 ] * 

[  P4  ]  =  [  x4  y4 , 1  ]  ’ 

[V]  =  [inv[p]]'*[P4]  =  [v1,v2,v3]’ 

[R]  =  [inv[o]]’*[Q4]  =  [r 1 , r2, r  3 ] * 

v  1  r3 
w  1  =  --  *  — 

r  1  v3 

v2  r3 

w2  =  --  *  -- 

r2  v3 


1  1 

1  1 

wl 

0 

0 

t  1 

1  f 

1  1 

1  1 

0 

w2 

0 

1  1 
f  1 

1  1 

1  1 

0 

0 

1 

1  i 

1  1 

Then- 

[t]’  -  [inv[q]]*[w]*[p] 

Such  that; 

[ Pi]  *  ki*[xi,yi,l]  »  [T]*[Qi]. 


b)  [VLt]  =  [0.  -5-14991.  1.31713]’ 

c)  DI  ■  .255758 

d)  0  *  .872665  raoians  (50  degrees) 

e)  [VLO]  -  [  .000925,  .000534.  .843880]' 

f)  [PPO]  *  [-51.0636.  -120.442,  1-31713]’ 

g)  DO  *  71  1.196 

h )  $  =  -.523599  radians  (-30  degrees) 

i)  XSGN  =  I 

YSGN  =  -1 

j)  DCP  =  544.8081 

XCP  -  -400.202 
YCP  -  -300. 1 17 
ZCP  =  350.196 


Example 

Given 

f  *  .3048  meters  (12  inches) 

PI  -  (-071263  .029665)  Q1  -  (  -30,  80) 

P2  =  (-.053033,  -.006379)  Q2  -  (-100,  -20) 

P3  =  (-.014063.  .061579)  Q3  -  (  140,  50) 

P4  *  (  .080120,  -.030305)  04  *  (  -40,  -240) 


!!  .000212  .000236  .000925  I! 

a)  [Tj ’  -  | J  -.000368  .000137  .000534  |j. 

i!  -.025404  .021650  .843879  jj 


83 


\  CP  (CENTER  OF  PERSPECTIVE) 
fft>v _ J  _  IMAGE  PLANE 

/  j  principal  point 


FIGURE  4  GEOMETRY  OF 


OPTICAL  AXIS 


f  ! 

VERTICAL  TfyiA 
AXIS  \  \ 

!  \  \e  \ 


REFERENCE  coordinate  system 


figure  2  geometry  of  the  location  determination 


center  of  PERSPECTIVE) 


*.  • 


D= 

Sin  0ab 


FIGURE  3  GEOMETRY  of  THE  P2P  PROBLEM 


85 


FIGURE  5  AN  EXAMPLE  SHOWING  FOUR  DISTINCT 
SOLUTIONS  TO  A  P3P  PROBLEM 


Consider  the  tetrahedron  in  Figure  5a.  The  base 
^BC  is  an  equilateral  triangle  and  the  "legs" 
(i.e.,  LA,  LB,  and  LC)  are  all  equal.  Therefore, 
the  three  face  angles  at  L  (i.e.,  <ALB,  <ALC,  and 
<BLC)  are  all  equal.  By  the  law  of  cosines  we 
have : 


Cos(Alpha)  =  5/8. 

This  tetrahedron  defines  one  solution  to  a  P3P 
problem.  A  second  solution  is  shown  in  Figure  5b. 
It  is  obtained  from  the  first  by  rotating  L  about 
BC.  It  is  necessary  to  verify  that  the  length  of 
L'A  can  be  1,  given  the  rigid  triangle  ABC  and  the 
angle  alpha.  From  the  law  of  cosines  we  have: 

2  2  2 

( 2*sqr t(3) )  «  4  ♦  (L'A)  -  2*4#(L' a)*(5/8) 

which  reduces  to: 

(L’A  -  1)  *  (L'A  -  4)  =  0. 

Therefore,  L'A  can  be  either  1  or  4.  Figure  5a 
■illustrates  the  L'A  =  4  case  and  Figure  5b 
illustrates  the  L'A  =  1  case. 

Notice  that  repositioning  the  base  triangle  so  that 
its  vertices  move  to  different  locations  on  the 
legs  is  equivalent  to  repositioning  L.  Figure  5c 
shows  the  position  cf  the  base  triangle  that 
corresponds  to  the  second  solution. 

Since  the  tetrahedron  in  Figure  5a  is  threefold 
rotationally  symmetric,  two  more  solutions  can  be 
obtained  by  rotating  the  triangle  about  A.B  and  AC. 


Figure  6a  specifies  a  P4P  problem  and  demonstrates 
one  solution.  A  second  solution  can  be  achieved  by 
rotating  the  base  about  BC  so  that  A  is  positioned 
at  a  different  point  on  its  leg  (see  Figure  6b).  Tc 
verify  that  this  is  a  valid  solution  consider  the 
plane  X  *  0,  which  is  normal  to  BC  and  contains  the 
points  L,  A,  and  D.  Figure  6c  shows  the  important 
features  in  this  plane.  The  cosine  of  alpha  is 
119/169.  A  rotation  uf  beta  about  BC  repoaitions  A 
at  A’ .  The  law  of  cosines  can  be  used  to  verify  the 
position  of  A’ . 


To  complete  this  solution  it  is  necessary  to  verify 
that  the  rotated  position  of  D  is  on  LD.  Conaider 
the  point  D’  in  Figure  6c  .  It  is  at  the  same 
distance  from  P  as  D  is  and  by  the  law  of  cosines 
we  can  show  that  gamma  equals  beta.  Therefore,  D‘, 
which  is  on  LD,  i3  the  rotated  poaition  of  D.  The 
points  k\  B,  C,  and  D'  form  the  second  solution  to 
the  problem. 


C(  10,-12,0) 


V  AXIS 


uno.- i2(oi 


Z  AXIS 


FIGURE./  6  AN  EXAMPLE  OF  A  P4P  PROBLEM 
WITH  TWO  SOLUTIONS 


P*Er 
(On  BC) 


i' 


\ 


87 


VANISHING  LINE  [VLO] 


FIGURE  9  COMPUTING  THE  3-D  LOCATION  OF  FIGURE  10  GEOMETRY  OF  THE  P4P  PROBLEM 

THE  CENTER  OF  PERSPECTIVE  (L)  (WITH  ALL  CONTROL  POINTS 

LYING  IN  A  COMMON  PLANE) 


89 


SEMANTIC  DESCRIPTION  OF  AERIAL  IMAGES 
USING  STOCHASTIC  LABELING 


O.D.  Faugeras  and  K.E.  Price 


Image  Processing  Institute 
University  of  Southern  California 
Los  Angeles,  California  90007 


ABSTRACT 

This  paper  discusses  the  application  of 
stochastic  labeling  to  a  general  symbolic  image 
description  problem.  A  method  used  to  compute 
initial  likelihoods  and  compatibilities  is 
described.  It  was  derived  from  an  earlier 
symbolic  matching  procedure,  but  was  modified  to 
provide  the  data  needed  for  application  of  the 
labeling  method.  This  labeling  procedure 
differs  from  simpler  ones,  in  that  it  maximizes 
a  global  criterion  at  each  iteration.  This 
technique  is  compared  to  other  matching  methods 
and  results  on  two  scenes  are  presented. 

INTRODUCTION 


problem  which  is  well  known  to  be  NP-complete. 
Practical  and  uselul  solutions  can  nonetheless 
be  found  as  we  will  show  in  this  paper. 

Matching  techniques  must  be  described  in 
terms  of  performance  on  a  well  defined  problem. 
The  particular  task  under  study  is  the  analysis 
of  an  image  of  a  scene  using  an  approximate 
specification  of  the  scene  which  would  apply  for 
many  different  images  of  the  scene  (i.e.  a 
model).  both  the  image  and  scene  are 
represented  by  semantic  networks,  the  image 
description  is  automatically  generated  and  thus 
reflects  any  errors  in  the  segmentation  process 
The  model  is  specified  by  the  user  and  contains 
only  the  important  objects  and  relations. 


The  purpose  of  computer  scene  analysis  is 
to  automatically  produce  a  description  of  the 
content  of  an  image  similar  to  one  obtained  from 
a  .-killed  human  observer.  In  order  to  achieve 
such  a  goal,  a  symbolic  description  of  the  raw 
image  data  must  be  constructed.  This  requires 
the  application  of  many  of  the  now  well 
developed  techniques  of  Image  Processing  (image 
bandwidth  compression,  image  restoration  and 
image  enhancement),  extraction  of  features  such 
as  texture  and  edges,  segmentation  of  the  image 
into  homogeneous  regions  with  respect  to  one  or 
several  properties,  and  measuring  features  that 
characterize  these  segments  (color,  brightness, 
texture,  size,  and  shape)  and  also  relations 
between  these  segments  (brighter  than,  larger 
than,  above,  below,  etc.).  The  output  of  this 
complex  sequence  of  processes  is  something  that 
doeo  not  resemble  the  original  input  array  of 
pixels  but  is  much  more  suitable  for  high  level 
processing:  a  symbolic  description  which  is 
represented  as  a  labelled  graph  or  semantic 
network.  To  proceed  any  further,  we  must  also 
assume  that  we  have  access  to  another  body  of 
knowledge  containing  a  priori  information  about 
the  expected  content  of  images  of  a  given  area. 
We  will  not  make  any  assumptions  ahout  how  this 
world  model  has  been  obtained  (manual  input  or 
intelligent  learning)  and  will  only  assume  that 
its  representation  is  the  same  as  the  image, 
l,e.»  that  it  is  also  a  semantic  network.  The 
process  of  obtaining  a  semantic  description  of  a 
given  image  can  then  be  viewed  as  finding  the 
solution  of  a  graph  matching  problem:  either 
match  the  image  onto  the  model  or  match  the 
model  onto  the  image.  Thus,  this  is  equivalent 
to  the  general  graph/subgraph  isomorphism 


a  similar  problem  was  attacked  by  Rubin  [1] 
with  a  search  procedure  and  a  more  detailed 
model.  His  work  has  been  combined  with  a 
relaxation  procedure  by  Smith  [2].  The  general 
form  of  the  solution  is  similar  to  ours  but  the 
scene  model  is  significantly  more  detailed  so 
that  exact  comparisons  cannot  be  made. 

We  will  first  preseni  the  image  and  model 
descriptions,  which  are  the  input  to  the 
matching  procedure,  then  the  basic  matching 
technique  which  we  use  to  compare  two  objects. 
Next,  the  global  matching  process  (labeling) 
will  be  described,  finally  we  will  discuss  the 
results  of  applying  this  procedure. 


DESCRIPTION  AND  BASIC  MATCHING  TECHNIQUES 


This  section  will  first  present  a 
description  of  the  symbolic  representation  used 
for  the  scene  and  image.  Then  the  method  by 
computing  initial  likelihoods  of  particular 
matches  is  described  along  with  the  method  to 
compute  compatibilities  of  pairs  of  matches. 

1.  Image  And  Model  Descriptions 


Our  matching  system  uses  a  feature  base', 
symbolic,  description  of  an  idealized  scene  (the 
model)  and  of  the  image  of  a  portion  of  this 
scene  [3j.  The  image  description  is  derived 
automatically  from  sn  image  and  the  model  is 
developed  by  the  user  through  an  interactive 
procedure  . 


The  basic  objects  used  for  the  image 
description  are  the  segments  of  the  image 


90 


generated  both  by  a  general  region  based  image 
segmentation  procedure  14]  and  by  a  linear 
feature  extraction  procedure  [5],  The  regions 
are  derived  by  locating  connected  areas  which 
are  uniform  with  respect  to  some  feature  in  the 
input  image  (color  parameters,  texture  values 
etc.).  Linear  features  are  defined  as  long 
narrow  objects  which  differ  from  the  background 
on  both  sides,  and  are  described  by  as  a 
sequence  of  straight  line  segments  with  some 
small  width.  Typically,  the  images  which  we  use 
have  a  total  of  100-200  individual  segments  of 
both  types.  The  symbolic  description  is 

completed  by  computing  various  features  of  the 
segments  and  relations  between  them. 

The  features  used  for  the  symbolic 

description  are  those  which  can  be  reliably 
computed  from  the  available  data  (the  input 
image  and  the  region  or  line  descriptions). 
They  include  properties  such  as  average  color 
and  texture  (currently  only  simple  texture 

measures),  size,  position,  two-dimensional 

orientation,  and  simple  shape  measures.  A  so 
included  are  various  relations  between  i-.,age 
segments  such  as  adjacency,  nearby,  and  relative 
positions  (above,  below,  etc.).  With  all  these 
relations  a  segment  may  easily  be  related  to  as 
many  as  100  other  segments.  This  description  is 
not  intended  to  be  used  for  reconstruction  of 
the  original  image,  it  is  meant  to  capture  the 
important,  observable,  information  contained  in 
the  image. 

The  model  description  is  identical  to  that 
used  for  the  image  -  feature  besed  descriptions 
of  basic  region-like  and  line-like  elements 
including  relations  between  then.  Additionally 
the  basic  elements  in  the  model  are  grouped  into 
more  complex  objects,  associated  with  generic 
descriptions,  and  referred  to  by  sctual  names 
The  feature  values  in  the  model  will  not 
correspond  exactly  to  image  values,  but  are 
approximations  of  the  likely  values,  so  that  one 
model  of  a  scene  can  be  used  with  many  similar 
images  of  the  same  scene.  The  relations  between 
elements  which  are  included  in  the  model  are  the 
.  important 11  ones,  that  is,  if  a  relation  sppears 
in  the  model  description  then  it  is  expected  to 
occur,  in  the  image  description,  but,  if  no 
relation  occurs  in  the  model  then  nothing  may  be 
said  about  its  appearance  in  the  image  (negative 
relations  could  be  used,  such  as  must  not  be 
adjacent,  etc.).  Similarly,  only  the  importsnt 
objects  are  described  in  the  model,  thus  it  is 
not  a  complete  description  of  the  entire  seen 
Generally,  the  model  description  is  smaller  *  an 
the  image  description  containing  20-30  bssic 
elements . 

In  summary,  the  input  for  the  matching 
procedures  is  the  symbolic  descriptions  of  both 
the  input  image  and  a  model  of  the  scene.  The 
model  description  determines  the  outcome  of  the 
matching  operations.  The  image  description  is 
automatically  derived  and  may  contsin 
errors  -  especially  where  simple  objects  are 
broken  into  several  pieces.  The  model 


description  is  incomplete,  as  it  contains  only 
important  objects,  thus  most  segments  in  the 
image  (and  objects  in  the  actual  scene)  will  not 
be  described  by  objects  in  the  model. 

2.  Basic  Matching  Technique 

The  global  matching  procedure  is  a 
stochastic  labeling  procedure  (also  called 
relaxation  procedure)  which  will  be  explained  in 
detail  in  the  next  section.  The  relaxation 
technique  requires  a  basic  procedure  to  compare 
how  well  a  model  element  agrees  with  an  m  age 
segment  for  its  operation.  In  our  previous  work 
in  matching  pairs  of  images  [2]  and  in  analysis 
of  images  using  models  of  the  scene  (6)  we  have 
developed  a  comparison  procedure  which  can  rate 
the  correspondence  of  an  object  in  one  image  (or 
model)  with  an  object  in  another  image  (or 
model).  The  basic  procedure  combines 
differences  in  all  available  features  and 
relations  to  produce  a  single  rating  of  the 
quality  of  the  match.  The  past  experiments 
indicated  that  this  procedure  produces  reliable, 
and  generally  accurate  measures  for  the 
differences  between  two  objects  (i.e.  the  model 
based  matching  performed  accurately  on  a  variety 
of  scenes).  The  problems  with  the  past  matching 
system  arose  in  the  use  of  these  results  by  the' 
global  matching  procedure,  particularly  in 
requirement  s  for  ordering  the  selection  of 
elements  to  match  and  the  handling  of  objects 
which  break  into  several  pieces.  (Thio  is 

discussed  in  more  detail  in  part  IV.) 

Briefly,  the  basic  matching  procedure 
combines  differences  in  all  feature  values  which 
are  weighted  to  sccount  for  the  difference 
ranges  of  values  (small  size  differences  (1000 
pixels)  sre  not  ss  important  for  large  regions 
but  small  changes  in  orientation  (0.5  radian) 
are  very  .  significant).  Additionally, 

differences  in  the  number  of  relotions  in  the 
model  and  the  number  of  the  ssme  relations 
between  the  corresponding  image  segments  are 
used.  All  of  these  components  are  given  a 
strength  (high,  medium,  low)  to  control  their 
impact  on  the  final  match  result  (i.e.  features 

known  to  be  marginally  useful  are  given  a  low 

strength,  and  those  considered  very  importsnt 

are  given  a  high  strenth)  .  In  the  earlier 

implementation  the  absolute  vslue  of  the  rating 
ranged  from  0  upward  to  10000  or  more.  These 
values  are  converted  to  the  rsnge  [0.0,  1.0]  by 
using  the  reciprocal  of  the  value  plus  1. 

pie  stochastic  lsbeling  operation  uses  the 
matching  procedure  for  two  distinct  purposes. 
First,  it  is  necesssry  to  determine  the  initial 
likelihood  of  a  psrticular  assignment  (i.e.  a 
rating  without  consideration  of  neighbors,  or  in 
other  words  use  only  the  unsry  relations). 
Second,  the  compatibility  of  particular 
assignments  for  two  objects  must  be  computed 
,i.e.  the  rsting  using  the  relations  between 
cwo  objects) . 

The  computation  of  initisl  likelihoods 


I  -v- 


¥ ■ '  , ' 


V  ■■■ 


91 


cannot  use  relations  between  objects  since  their  contribute  much  to  the  final  results  (see  the 

use  depends  on  assignments  of  model  elements  to  final  section).  If  an  object  has  a  firm 

image  segments.  Therefore,  the  initial  match  is  assignment  (i.e.  some  segment  has  been  selected 

limited  to  feature  values  (color,  si-je  shape,  as  corresponding  to  a  model  object)  then  this 

etc.).  A  model  element  is  compared  with  all  the  assignment  is  included  in  the  compatibility 

image  segments  using  our  basic  matching  computation  in  addition  to  the  number  of 

procedure.  The  best  matching  segments  are  kept  assignments  specified  by  the  parameter  described 

for  further  analysis,  currently  up  to  thirty  are  above, 

used  or  up  to  the  point  where  the  worst  is  1/10 

of  the  best  (whichever  given  the  smaller  set).  RELAXATION  ALGORITHM 

Then  the  match  results  are  scaled  so  that  they 

sum  to  1,  to  be  treated  as  probabilities  in  the  This  section  describes  the  basic  stochastic 

stochastic  labeling  procedure.  labeling  and  the  optimization  techniques 

algorithms.  The  second  part  of  the  section 
Clearly,  if  feature  values,  alone,  are  presents  the  variations  of  the  basic  procedure 

sufficient  to  locate  correct  matches,  then  the  which  were  required  for  this  symbolic  match: ng 

process  could  stop  at  this  point.  But,  even  problem, 

though  features  are  sufficient  for  some  well 

defined  objects,  they  do  not  locate  most  I.  General  Description 

correspondences  by  themselves. 


I 


After  an  initial  application  of  the 
relaxation  procedure  several  assignments  may  be 
made,  At  this  point,  the  computation  of  initial 
likelihoods  for  an  object  can,  and  does,  use  the 
relations  with  assigned  elements.  This  means 
that  initial  likelihoods  are  always  computed 
with  all  available  information,  initially  only 
feature  values  then  an  increasing  number  of 
relations,  Therefore  successive  steps  which  are 
using  more  information  can  more  reliably  match 
the  less  well  defined  objects. 

The  compatibility  measure  computes  the 
effect  of  making  an  assignnent  for  one  element 
on  the  assignment  for  another  element.  The 
interaction  between  objects  and  their 


Relaxation  labeling  attempts  to  efficiently 
solve  a  very  general  problem  in  Pattern 
Recognition  and  Artificial  Intelligence',  given  a 
set  of  units  U  and  a  set  of  names  N,  assign 
names  to  units  given  actual  measured  features 
and  a  world  model.  There  are  two  broad  classes 
of  Relaxation  techniques.  The  first  one,  called 
discrete  Relaxation,  handles  the  case  where,  for 
every  unit,  we  can  know  if  a  name  is  possible  or 
impossible.  Discrete  Relaxation  is  then  an 
efficient  way  to  solve  the  search  problem  of 
finding  a  labeling  of  the  units.  Continuous 
Relaxation  handles  the  cases  where  we  know  more 
than  just  whether  names  are  possible  or 
impossible,  namely  a  measure  of  their 
like  1 ihood  . 


assignments  is  through  the  relations  between 
them,  so  that  the  compatibility  measure  is  based 
on  these  relations.  Here  again,  the  same  basic 
matching  procedure  is  used  to  compute  how  well 
the  relations  in  the  model  match  the  relations 
in  the  image.  The  procedure  has  been  modified 
so  that  only  the  relations  between  the  two  model 
elements  and  between  the  potential  corresponding 
image  elements  are  considered. 


In  some  implementations  of  a  stochastic 
labeling  procedure,  all  the  possible 
compatibility  measures  are  computed  once  at  the 
beginning,  but  because  of  the  total  number  of 
possibilities  wh  .ch  would  be  required,  this  can 
not  be  done.  Since  only  a  small  fraction  of  the 
total  numl  >r  are  ever  required,  they  are 
computed  as  needed.  Clearly,  in  one  experiment, 
the  same  value  will  be  computed  several  times, 
but  the  cost  is  small. 

The  compatibility  measure  is  computed  using 
the  most  likely  assignments  for  the  second 
object.  The  individual  matches  are  weighted  by 
the  likelihood  of  a  particular  assignment  so 
that  more  likely  assignments  contribute  more  to 
tHe  result ,  The  number  of  likely  assignments  to 
be  considered  is  determined  by  an  input 
parameter.  The  greater  the  number  of 
assignments,  the  greater  time  the  procedure  will 
take.  Experiments  have  indicated  that  using 
more  than  one  or  two  assignments  does  not 


In  the  first  case,  the  world  model  consists 
of  a  binary  relation  Rc( UxN)x( UxN)  that 
determines  whether  assigning  name  n^  to  unit 
is  compatible  with  assigning  name  n2  to  unit  U2 • 
In  the  second  case,  this  compatibility  is  given 
by  a  positive  number  c(uj ,nlsu2 ,n2) ,  which  is 
small  if  the  compatibility  is  weak  and  large  if 
it  is  strong.  Function  c  is  defined  in  general 
only  over  a  subset  S  of  (UxN)x(UxN).  To  every 
unit  u{  and  name  ilk  we  can  thus  associate  the 
set  V ±( k )  of  related  units  u.  such  that  there 
exists  a  name  for  which  (u-^  ,nk,u j  , n^)  is  in 


In  this  paper,  we  are  solely  concerned  with 
continuous  Relaxation,  where  for  every  unit  u., 
ithere  is  a  corresponding  probability  vector 
Pi  -iPi(l) .  .Pi(Li)]T  where  Pi(k)  (l<k<Li) 
measures  the  probability  that  unit  u1  has  the 
name  nk;  The  set  of  all  vectors  pi  is  called  a 
stochastic  lab2ling  of  the  set  of  units  U,  As 
proposed  in  [7,8]  we  ern  also  define  for  each 
unit  a  compatibility  or  prediction  vector 
qi  =  [  q±(  1)  •  •  -q  i(Li) ]  that  tells  us  what  p± 
should  be,  given  the  probability  of  assignments 
Pj  at  neighboring  units  uj  and  the  world  model 
embedded  in  function  c.  For  simplicity  we  will 
rewrite  c(ui,nk,u,  ,np)as  c(i,k,j,0  in  what 
follows.  J 

As  described  in  [7,8]  we  can  take 


/6  0 


:  '  /■. 


■ :  'j*? 


■  ■■  m* 


92 


q^k)  =  •— i -  (U 

wiie re  i  =lQj 

Qi(k)=  /  ^  c(i,k,j,f.)p  (£) 

Uj  in  ^(£)  a  in  w  J  (2) 

TitV8  3  r"bT  0f  the  Jp0ssible  «—•  for 

we  t0okrWi  l  {  j  T"“w  rKP°rted  ^  f7’51 
7 J  * • • . , LsJ  t  but  because  of  thp 

large  number  of  possible  names  in  this  task  and 
for  the  sake  of  efficiency  „e  took 

Wj  =  (set  cf  n  most  likely  names} 

Usually  with  n  —  1  cpi  _ .  •  c 

of  a  urit  y»  -J  71,at  1S’  for  every  neighbor 
tk  “  V  considered  only  the  contribution  of 
the  most  likely  name  in  Eq .  (2). 

In  [7,8]  we  proposed  to  use  local  measures 
consistency  and  ambiguity  of  the  form 

+  d-a)H.  (3) 

"'here  I  i '  I  1 2  is  the  usual  Euclidean  norm  Hi  the 
quadratic  entropy  ’  1  the 

L. 

1 

“i (1~Pi^))  =  1-IIpJI?  (4) 
i=i 

and  aa  weighting  factor  adjusting  the  relative 
importance  of  consistency  versus  ambiguity  It 
was  found  in  [9]  that  an  even  better  measure  is 
given  by  the  inner  product 

Pi*$i  (5) 

The  global  measure  is  then  an  average  over  the 
£°l  °£  ‘“*1  — •««.  »<  can  .wj 


Pf«l 


all  units 


as  a  global  criterion  over  the  set  of  units  that 
labeling .  consi^ency  and  ambiguity  of  the 

the  following  j1"8  Pr°bl<?m  is  now  equivalent  to 

given  an  initial  labeling  t<0)  ,  find  the 
9  irl  •  mlaX.lmum  ?f  criterion  C  closest  to  the 

and  f"iC  "i  ?  l0Cal  gradient  vector  g .  = 3  C 
and  define  an  iteration  scheme  as:  1  ^ 

5(n+l)  =  ?(n)+  pnp{?(n))  n>Q  (?) 


where  p„  is  a  positive  number  and  p  a  linear 
projection  operator.  For  a  description  of  P  see 

It  can  be  easily  shown  that: 

8i(k)=qi(k)+  Z  ~  £  C(j,e,i,k)* 


neighbors  u  Z  in  W 
of  u.  J  J 

i  ,  -*  -> 


(Pj  (^)-Pj  *qj)  for  k=l,...,Li 


(9) 

The  first  term  ?1(k)  in  Eq.  (6)  corresponds  to 
the  simple  maximization  of  the  product  V  $  in 
£iO  al  criterion  C,  whereas  the  second  term 
orresponds  to  the  coupling  between  units 
through  the  compatibility  relations.  The 

evolve  F  u  Eq>  (5)  wiU  allow  us  to 

evolve  from  the  initial  stochastic  labeling 

labeling?  ^  and  consistent 

2.  Least  Commitment  Versus  Speed 
Multiple  Matches 

^be  "ask  of  matching  a  model  with  a 

H!?omatirei''eSentat-°n  °f  ^  ima8e  0btained  ^ 
a  automatic  .egmentation  procedure  presents  the 

,h"  "■"h" 

unique.  For  example,  if  a  highway  in  the  image 
has  been  separated  by  the  segmentation  procedural 
several  disconnected  pieces,  then  each 

Ztt  lf,  ?  potential  correct  match  for  the  node 
highway"  m  the  model. 

would0!!?  t°SSii  16  ^  °f  handlin8  this  problem 

nr  s 

rsr:  j- 

tendency  to  cinu  »-urt  Bcnerai 

iterative  scheme.  whT^d  £  * 

alternative  between  an  increase  in  speed  (and 

andr?h°re  -£  the  probability  Of  makingP  errors) 
and  the  principle  of  least  commitment. 

We  therefore  introduced  the  notion  of  a 
Macro  iteration  composed  of  several  of  the 

liTer  Zet6^  ^  Eq-  (?)  (^-iterations) 

to  uni^  EasedeC1S1°nS  T  made  t0  assign 

ased  upon  the  comparison  of  the 
components  of  the  vectors  2  _  ,  che 

(usual!  v  8(1-21  f  vectors  p£  to  a  threshold 
lusually  80%) .  If  one  component  is  larger  than 

i'  ?r  ™  si™  ;• u  ■«  «i“*i  f  i 

ls  considered  to  be  assigned  the 
corresponding  name  n^.  6  Ch 

The  process  is  then  reinitialized  bv 
computing  new  initial  probabilities  ?fof  £ 
units  Ui  which  have  been  assigned  namfs  these 

*™  “:.bs  vxir*.£s 


93 


probabilities  only  when  units  are  assigned 
(otherwise  only  features  are  used),  Lhis  also 
has  the  advantage  of  improving  the  original 
est imates  . 


In  the  optimization  problem  ({?)  the 
constraints  are  now: 


r 

That  is 
for  aii 


L. 

1 

( k )  5  k^+1 

k=l 

we  do  not  al low  the 
ady  assigned  names  to 


p.  D  -  1  for  all 
names  Z  wh 'ch 
have  alreauy 
been  assigned  to 
unit  u  .  . 

confidence  value 
be  changed  from  1 . 


COMPARISON  WITH  OTHER  APPROACHES 
AND  EXPERIMENTAL  RESULTS 


We  compared  our  approach  with  resulLs  from 
two  other  techniques:  the  relaxation  procedure 
originally  introduced  by  Rosenfeld,  Hummel  and 
Zucker  [10]  (algorithm  RHZ)  and  the  sequential 
matching  procedure  developed  by  Nevatia  and 
Price  [6]  (algorithm  NP).  The  nonlinear 
iterative  updating  formula  of  algorithm  RHZ  can 
be  expressed  as 


r««  ■  ~T7 

l 


p]n)  (k)Q.  (k) 


Yrf'  (OQi 


( 10) 


it) 


wiiere  the  Qi  ’  s  are  computed  in  the  same  way  as 
in  Eq.  (2).  On  this  particular  proble, 
algorithm  RHZ  has  a  tendency  to  converge  toward 
ambiguous  solutions.  The  reason  is,  of  course 
that  this  algorithm  does  not  take  into  account 
completely  the  coupling  between  units  as 
reflected  by  the  c (  i  ,k  ,  j  ,5- )  1  s  and  fails  to 
account  for  the  notion  of  ambiguity.  As  shown 
in  Eqs.  (6)  and  (8)  this  is  not  the  case  with 
the  algorithm  described  in  this  paper. 


The  basic  comparison  procedure  used  here  is 
the  same  as  used  in  algorithm  NP ,  so  that  many 
of  the  results  are  identical.  The  major 
problems  with  the  sequential  method  are  the  lack 
of  a  consistent  method  for  the  handling  of 
multiple  assignments  and  errors  which  are  caused 
by  the  order  in  which  the  objects  are  selected 
for  matching.  For  example,  if  a  pair  of 
identical  objects  are  near  each  other,  the 
sequential  procedure  can  easily  find  the  wrong 
assignment  for  the  first  object,  thus  forcing 
the  second  to  also  be  incorrect,  due  to  the 
limit  of  assigning  one  name  to  only  one  unit 
(units  may  be  assigned  several  names).  Since 
the  relaxation  procedure  is  considering  pairs  of 
objects,  the  correct  assignments  advance  to  the 
top,  if  there  are  relations  connecting  the  two 
objects . 

1.  Results 

We  have  applied  this  procedure  to  several 
different  scenes  or  portions  of  scenes  and  will 
present  results  from  two  of  these  scenes.  The 


first  is  a  high  altitude  aeri.i  »vmflge  with  a  few 
major  regions  (a  city  and  rural  areas)  and  a 
number  of  linear  features  (a  major  highway, 
river  channels,  and  roads  a  lone  the  channels), 
(see  Fig.  2).  The  second  closer  view  (of  a 
different  area),  with  ,2  roads  included  in  the 
model.  The  roads  are  described  ir  ’Lions 
because  of  the  tendency  cf  the  linear  feature 
extraction  procedure  to  break  long  linear 
features  into  shorter  segments,  (see  Fig.  3). 

To  illustrate  how  the  relaxation  procedure 
operates,  Fig.  1  is  a  graph  of  the  probabilities 
of  various  assignments  fer  one  object  through  a 
series  of  macro  iterations.  The  values  are 
given  for  each  mi c ro i t era t i on  through  a  long 
sequence  ot  macro  iterations.  Figures  2  and  3 
show  the  final  results.  with  the  objects 
outlined  and  labeled.  Many  of  the  labels 
overlap  since  adjacent  linear  features  are  being 
presented . 

REFERENCES 

1.  S.  Rubin,  "The  ARGOS  Image  Understanding 
System,"  Ph.D.  thesis,  Computf?r  Science 
Department,  Carneg i e -Me  1 1  on  U. ,  Pittsburgh,  PA, 

1978. 

2.  D.  Smith,  "Search  Strategies  for  the  ARGOS 
Image  Understanding  System, 11  in  Proc  .  Image 
Understanding  Workshop,  Los  Angeles,  Ca  .  Nov. 

1979,  pp.  42-46. 

3.  K.  Price  and  R.  Reddy,  "Matching  Segments  of 
Images,"  IEEE  Trans-PAMI  Vol .  1,  Jan,  1979,  pp. 
110-1 10. 

4.  R.  Ohlander,  K.  Price  and  R.  Reddy,  "Picture 
Segmentation  Using  a  Recursive  Region  Splitting 
Method,"  Comp.  Graphics  and  Image  Processing, 
Vol.  8,  pp.  313-333,  1978. 

5.  R.  Nevatia,  and  K.R.  Babu,  "Linear  Feature 
Extraction  and  Description,"  to  appear  in 
Computer  Graphics  and  Image  Processing . 

6.  R.  Nevatia  and  K.  Price,.  "Locating 
Structures  in  Aerial  Images,"  submitted  for 
publication . 

7.  0 . D .  Faugeras  and  M.  Berthod  ,  Scene 
Labeling:  An  Optimization  Approach,"  Proceeding 
of  the  IEEE  Computer  Society  Conference  on 
Pattern  Recognition  and  Image  Processing, 
pp.  318-396,  Chicago,  August  6-8,  1980. 

8.  O.D.  Faugeras  and  M.  Berthod,  "Improving 
Consistency  and  Reducing  Ambiguity  in  Stochastic 
Labeling:  An  Optimization  Approach,"  submitted 
to  the  IEEE  Trains .  on  Pattern  Analvsis  and 
Machine  Intelligence,  November  1979. 

9 

9.  M.  Berthod  and  O.D.  Faugeras,  "Using  Context 
in  the  Global  Recognition  of  a  Set  of  Objects: 
An  Optimization  Approach," 

10.  A.  Rosenfeld,  R.A.  Humirel  and  S.W.  Zucker, 
"Scene  Labeling  by  Relaxation  Operations,"  IEEE 
Trans.  on  Syst  .  ,  Man,  and  Cybern .  SMC-6 , 
No.  6,  pp.  420-453,  June  1976. 


•NMTtfnWt5T-ci!)B»fe 

C 

f  J  ; 

'  >  !  ■■l  .i  T  *  -  mi 


ftL-WEHWtf 
R&AD  ■  *f 


/  ^^T.^im^HCSAD" 
jfv  ^ ^  .  ' 


Fig.  1 


Iteration  Number 

Graph  of  the  match  likelihoods  for  several  image  segments  to  be 
assigned  to  one  model  element.  The  model  element  be  nq  consider?, 
is  labeled  "SOUTH-HIGHWAY"  in  the  final  results  of  F?  ?  6  Thf 

£h?\-harkS  al°ng  thG  horlzontal  axis  indicate  each  Macro  itera  ion 

Sole  also  thatethe  ?’•?•  Fx*  t0  1 '  °  are  a11  ^rect  assignments 
ih?  f  1  th  t  th  imtJ-al  best  assignments  is  not  considered  aftei 
the  firm  assignment  made  on  the  second  macro  iteration. 


•r-~ 


Fig.  2  Final  assignments  for 
Stobkton  area. 


%«f  1  r 


Fig.  3  Final  assignments  for 
road  segments  of  Ft. 
Bejvoir  area. 


95 


I 

REPRESENTING  AND  REASONING  ABOUT  PARTIALLY  SPECIFIED  SCENES 


Rodney  A.  Brooks  and  Thomas  0.  Binford 


Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 

Abstract 


We  present  a  representational  scheme  for  specific  and 
generic  objects,  partially  specified  scenes  and  partially  specified 
camera  models.  It  is  built  on  top  of  our  previous  geometric 
volume  based  representations,  and  relies  on  specialization  by 
constraints  as  its  primitive  element.  We  extend  the  specialization 
mechanism  to  enable  case  analysis  in  determining  observational 
invariants  of  objects.  A  set  of  rules  is  given  which  aids  in 
reasoning  about  geometric  relationships  between  coordinate 
lystems  linked  by  multiple  partially  specified  coordinate 
transforms.  Together  these  two  mechanisms  provide  powerful 
methods  for  predicting  the  appearances  of  objects.  We 
demonstrate  how  to  make  use  of  two  dimensional  match 
information  to  interpret  the  image  in  a  three  dimensional  model 
of  the  world.  Again  we  make  use  of  the  specialization 
mechanism. 


L  Introduction 

The  development  of  the  ACRONYM  model-based  vision 
tystem  has  been  previously  described  ([4],  [5]  and  [6]).  A  brief 
overview  of  the  system  follows.  This  is  intended  to  give  a  firm 
basis  for  the  discussion  of  solutions  to  computational  problems 
which  arise  when  Image  Interpretation  is  viewed  as  an 
interaction  of  prediction  and  description.  This  paper  will  be 
concerned  primarily  with  prediction  and  use  of  predictions  to 
interpret  descriptions  produced  by  low  level  image  processing. 

Figure  1  shows  a  block  diagram  of  the  major  modules 
and  data  structures  of  ACRONYM.  The  data  structures 
comprise  the  middle  column.  A  user  Interacts  with  the  system 
via  a  high  level  modeling  language  and  an  Interactive  editor  to 
provide  three  dimensional  descriptions  of  objects  and  object 
classes.  A  graphics  module  provides  valuable  feedback  during 
this  task.  The  models  so  constructed  are  volumetric  descriptions 
based  on  generalized  nones  [11  The  models  are  tree  structured 
and  provide  multiple  levels  of  detail  in  their  representation.  All 
this  has  previously  been  described  in  detail  (e.g.  [6]).  In  section  2 
we  describe  a  new  level  of  representation  bu.lt  on  top  of  this, 

A  rule-based  module,  the  predictor  and  planner,  takes 
models  of  objects  and  scenes  and  produces  the  observability 
graph  which  is  a  prediction  of  the  appearance  of  the  objects 
expected  within  the  scene,  and  a  plan,  or  Instructions-,  for 
descriptive  processes  and  the  matcher  to  find  instances  of  the 
objects  within  the  Image.  The  process  of  prediction  and 
planning  is  repeated  as  first  coarse  Interpretations  are  found, 
more  predictions  are  carried  out,  and  finer,  less  ambiguous 
Interpr  nations  are  produced.  Section  3  describes  some  new 
mechanisms  used  for  prediction,  and  the  way  they  fit  In  with 


natural  extensions  to  the  additions  to  object  representation 
described  in  section  2. 


Fig.  1.  The  ACRONYM  system. 


96 


Consider  the  lower  portion  of  figure  1.  The  edge  mapper 
describes  pictures  as  ribbons  and  their  spatial  relationships. 
Ribbons  are  the  two  dimensional  specializatior  of  generalized 
tones  [4].  Currently  we  deal  only  with  monocular  images, 
making  use  of  the  line  finder  of  Nevada  and  Babu  [71  and  the 
goal  directed  ribbon  finder  of  Erooks  [3J  Again,  this  edge 
mapping  module  may  be  invoked  many  times  during  the  course 
of  an  interpretation  as  finer  levels  of  interpretation  are  called 
iOr.  Later  ACRONYM  will  use  stereo  pairs  of  images  and  the 
surface  mapper  will  extract  depth  information  In  the  form  of 
turface  descriptions. 


all  aircraft  as  there  may  be  many  types  of  aiicraft  in  the  image. 

In  the  following  discussion  we  will  consider  the  problem  of 
modeling  the  class  of  wide-bodied  passenger  jet  aircraft,  and 
specific  wide-bodied  passenger  jet  aircraft,  such  as  the  Boeing 
7-17,  Lockh  ’ed  L-iOil,  McDonnell-DougJas  DC-10  and  the 
Airbus  Consortium  A -300.  We  will  then  model  a  wider  situation 
where  such  aircraft  are  on  runways  and  taxlways,  and  there  are 
undetermined  variables  in  the  camera  model. 

Quantifiers 


The  matcher  provides  the  interface  between  des.ription 
and  prediction.  In  all  the  image  interpretations  done  by 
ACRONYM  to  date  the  matcher  has  been  a  syntactic  graph 
matcher  written  by  Russell  Greiner  [61  It  has  been  used  to 
match  the  two  dimensional  predictions  of  the  Observability 
graph  to  the  two  dimensional  descriptions  of  the  Picture  graph. 
In  section  3  we  give  the  some  details  of  the  design  of  a  new 
matcher  tc  be  implemented  in  the  same  rule  set  as  that  used  for 
ihe  predictor  and  planner.  It  will  directly  relate  two  dimensional 
descriptions  back  to  three  dimensional  models,  providing  a 
Rrong  mechanism  for  ensuring  global  consistency  of  local 
interpretations.  It  Is  expected  that  the  matcher  will  gradually 
merge  with  the  predictor  and  planner  since  tney  will  interact 
with  each  o^her  to  a  much  greater  degree  in  the  new 
Implementation. 

2.  Representation 

We  have  previously  ([4],  [5],  [6])  discussed  ACRONYM’S 
geometric  representation  of  objects  as  tiun.es  using  generalized 
cones  [i]  as  a  volume  primitive.  We  have  alluded  to,  but  not 
explained  in  detail,  representation  of  generic  classes  and  specific 
objects,  explicit  use  of  symme.ry  and  representation  of  camera 
parameters.  In  this  section  we  explain  the  aspens  of  our 
representation  scheme  devoted  to  these  issues  by  means  of  an 
extended  example. 

The  choice  of  representation  scheme  is  driven  both  by  (I) 
what  we  wish  to  represent,  and  (2)  the  class  of  computations  we 
wish  to  base  upon  the  representation. 

(1)  We  need  to  represent  generic  classes  of  objects.  Tiiis  involves 
representing  both  variable  sizes  and  variable  structures  with 
local  constraints  between  such  variations.  We  also  wish  to 
represent  b:th  subclasses  and  specific  instances.  Thus  we  will 
need  to  talk  about  both  specialization  and  generalization.  Our 
representation  scheme  is  built  on  the  specialization  p-imitive 
Thus  generalization  at  the  input  level  must  be  restructured  to  be 
a  specialization  at  the  representational  level.  We  may 
Incorporate  generalization  directly  later. 

(2)  We  need  to  reason  about  object  classes  and  their 
tpncializations  at  the  same  time.  This  will  allow  matches  to  some 
part  of  an  object  to  be  carried  down  to  a  match  to  the 
corresponding  part  of  a  specialization  of  the  object 
automatically.  We  need  to  be  able  to  reason  both  rbout  multiple 
Instances  of  modeled  objects,  and  multiple  candidates  for  one  or 
more  instances,  at  the  same  time.  Local  interpretations  provide 
mere  global  constraints,  but  we  need  to  know  how  global  these 
constraints  can  safely  be  made.  This  must  be  explicitly 
represented  in  the  model.  For  instance  constraints  on  the  length 
of  a  wing  deduced  from  a  local  interpretation  extend  over  both 
wingn  a  single  aircraft,  but  they  do  not  extend  globally  over 


To  represent  the  class  of  wide-budied  passenger  jet 
aircraft  we  need  to  represent  both  variations  in  size  (eg. 
different  aircraft  subclasses  will  have  different  fuselage  lengths), 
and  variations  in  structure  (e.g.  different  aircraft  subclasses  will 
have  different  engine  configurations).  In  both  cases  we  to 
represent  the  range  of  allowable  variations.  We  refer  to  this  as 
quantification  of  sets.  Furthermore,  there  will  sometimes  be 
Interdependencies  between  these  variations  (e.g.  a  scaling 
between  fuselage  length  and  wing  span). 

The  primitive  representational  mechanism  used  in 
ACRONYM  is  that  of  units  and  slots.  These  are  a  slight 
generalization  of  atoms  and  properties  of  LISP.  Objects  are 
represented  by  units,  as  are  generalized  cor.es,  cmss-sections, 
sweeping-rules,  spines,  rotations  and  translations  to  name  the 
more  important  ones  Figure  2  shows  four  units  with  their  slots 
*nd  fillers  from  a  particular  ACRONYM  model.  They  describe 
the  generalized  cone  representing  the  fuselage  of  the  generic 
wide-bodied  passenger  jet  aiicraft.  Note  that  units  are  referred 
to  as  ’’Nodes”  because  they  are  nodes  of  the  Object  graph  of 
figure  i.  The  NAME  slot  is  a  distinguished  slot  which  all  units 
possess.  It  describes  the  entity  represented  by  the  unit  and 
corresponds  to  the  SELF  slot  of  KRL  units  ([2]).  Units 
Identified  by  MZM  followed  by  a  four  digit  number  nr e  those 
which  were  given  no  explicit  Identifier  by  the  user  who 
modelled  the  object.  The  modeling  language  parser  has 
generated  unique  identifiers  for  them. 


Node:  FUSELAGE-CONE 
NAME:  SIMPLE-CONE 

SPINEi  Z0005 

GUEEPI N3-RULE:  CONSTANT-SUEEPING-RULE 

CROSS-SECTION:  Z0004 


Node:  Z0005 

NAME:  SPINE 

TYPE:  STRAIGHT 

LENGTH:  FUSELAGE-LENGTH 

Node:  CONSTANT -SWEEPING-RULE 
NAME:  SUEEPING-RULE 

TYPEi  CONSTANT 


Node:  Z0004 

NAME:  CROSS-SECTION 

TYPE:  CIRCLE 

RA0IUS:  FUSELAGE-RADIUS 


Generalized  cone  representation  of  fuselage. 

Fig.  2. 

The  value  of  a  slu  Is  given  by  its  filler.  Slot  fillers  may  be 
Immediate  values,  such  as  "2"  or  "STRAIGHT’*.  They  can  also 


t 


97 


I 


be  symbolic  constants  in  the  same  sense  as  constants  are  used  in 
programming  languages  such  as  PASCAL.  Such  fillers  are  fine 
lor  representing  specific  completely  determined  objects  and 
situations.  To  allow  for  representation  of  variable  quantities 
>  ots  may  be  filled  by  a  quantifier.  A  quantifier  is  a  symbolic 
Pn"^°'d„er  for  a  sl°‘  filler.  "FUSELAGE-LENCTH"  and 
USELACE-RADIUS"  are  examples  of  such  quantifiers  in 
igure  2.  Constraints  can  be  placed  on  the  values  they  are 
allowed*  to  represent.  1 

rncr-,  fA°”°win&  constraints  might  be  imposed  upon 
FUSELACE.LENGTH  and  FUSELACE-RADIUS  when 
modeling  the  class  of  wide-bodled  passenger  jet  aircraft 

RJSELAGE-LENGfH:  (A  (V)  (INTERVAL  V  49.9  60.0)) 
FUSELAGE  -RAO  I  US :  (A  (V)  ( I NTERVAI  V  2.S  4.9)) 

These  constrain  the  two  quantifiers  to  be  floating  point 
quantities  lying  in  the  closed  intervals  [10.0,  60.0]  and  [2  6  10] 
(the  units  are  in  meters).  Thus  the  constraints  say  that  a 
w.de-bodied  passenger  jet  aircraft  has  fuselage  length  ranging 
f  om  20  to  60  meters,  and  fuselage  radius  from  2.5  to  1  meters^ 
n  general  a  quantifier  can  have  an  arbitrary  number  of  such 
constraints.  We  will  discuss  later  the  way  in  which  these 
constraints  are  manipulated  and  used  during  prediction  and 
interpretation. 

Such  quantifiers  are  thus  used  to  express  allowable 
variations  in  size  of  objects  They  are  also  used  to  express 
variations  in  the  allowable  structure  of  objects.  Figure  3  gives 
the  complete  subpart  tree  for  a  model  of  generic  wide-bodied 

rvR  trnr  jet  aircraft  For  brevity  not  all  the  slots  of  the 
UBJECi  units  are  shown  here.  The  QUANTIFIERS  slot  is 
explained  in  section  1,  The  SUBPARTS  slot  of  an  OBJECT 
unit  is  filled  with  a  list  of  subparts  giving  the  next  level  of 
escnption  of  the  object.  Entries  in  the  list  can  be  simple 
pointers  to  other  OBJECT  units  (eg  JET  AIRCRAFT  has 
t.e.er?UAl ^truetures:  STAR  BOA  RD-WINC,  PORT-WINC  and 
ULLAGE).  They  can  also  be  more  complex  such  as  the  single 
entry  for  the  subparts  of  STAR  BOAR  D-WINC,  which  specifies 
a  quantification  of  subparts  called  STARBOARD-ENGINE, 
e  quantification  description  follows  the  same  rules  as  unit  slot 

c'enc  nir  1  amt  m5*  u*  quantlficajon  15  the  quantifier 

Df  PORT  ENpTnf616  tha'PORT-WING  ^  a  quantification 
PORT-ENGINEs  as  subparts,  which  is  represented  by  the 

>ame  quantifier  F-ENG-QUANT.  Thus  we  have  explicitly 
represented  the  symmetry  of  the  aircraft;  it  has  the  same  number 
t>f  engines  attached  to  each  wing.  Constraint*  on  this  quantifier 
and  on  R-ENG-QUANT,  the  number  of  rear  engines  might  be: 

F-ENG-QUaNVi  (A  (I)  (INTERVAL  112)) 

r-eng-quant,  £J!!  SS  ? 

."•engAVa"n? “ghb,  •—  f"' 

(I)  MEMBER  (LIST  I  R-ENG-QUANT) 

’  (d  8)  (1  1)  (2  8)))) 


each  J  y  '  th6re  must  be  either  one  two  engines  on 

re  two  on  Z6r°  °r,one  at  the  ™  the  aircraft,  and  if  there 

return  to  rhl  h  "  ^  the"  ‘here  afe  Zer0  tl  the  rear-  We  will 
return  to  this  example  in  section  4. 

Symmetry  of  size  (such  as  length  of  the  wings)  can 


to  ,  ]P  'Z  b’  U,,n8  “»  »  ■  place 

,«  ,f  °PT  InterdepemJenciei  , 

.b.«,  can  rip"  ln"r'r  f™d  *"d  »" 

Node:  JET-AIRCRAFT  constraints  on  the  Quantifiers. 

NA’',E:  OBJECT 

SUBPARTS i  (STARBOARO-JING  PORT-WING 

QUANTIFIERS:  (F-ENG-QUANT^NGINE-LENGTH 

ENGINE-RADIUS 
WING-ATTACHMENT  ENG-OUT 
ONE-WING-SPAN 
U I NG -SWEEP -BACK 
WING-LENGTH  WING-RATIO 
WING-WIDTH  WING- THICK) 

Node:  STARB0AR0-W! NG 
NAnE:  OBJECT 

SUBPARTS:  ( (SP-0ES  F-ENG-QUANT  . 

rnMc  nrmr,,  STARBOARD-ENGINE)) 

CONE-  0ESCR I P  TOR :  SI  ARB0AR0-W I NG-C0NE 

Node:  STARB0AR0-ENGINE 
NAME:  OBJECT 

C0NE-0ESCRIPT0R:  PORT-ENGINE-CONE 

Node:  PORT-WING 
NAME: 

SUBPARTS: 


OBJECT 

( (SP-0ES  F-ENG-QUANT 

rnnr  nr Pm,„  PORT-ENGINE)) 

CONE -DESCRIPTOR:  PORT-WING-CONE 

Node:  PORT-ENGINE 
NAnE:  OBJECT 

C0NE-0ESCR I PT0R:  P0RT-ENG I NE-C0NE 

Node:  FUSELAGE 
NAME: 

SUBPARTS: 


QUANTIFIERS: 


OBJECT 

(RU00ER  STARB0AR0-STAB I  LI  ZER 
PORT-STABILIZER) 
(STAB-ATTACH  STAB-WIDTH 

STAB-THICK  STAB-SPAN 
STAB-SWEEP-BACK 

CONE-DESCRIPTOR:  FUSELAGE-CONE^  ^  ^ 


Node:  RUDDER 
NAME: 
SUBPARTS: 


OBJECT 

( (SP-0ES  R-ENG-QUANT  . 
C0NE-0ESCRIPT0R:  RUOOER-CONeT  ENGINE>) 


Node:  REAR-ENGINE 
NAME:  OBJECT 

C0NE-QESCR I PT0R :  REAR-ENG I NE-C0NE 

Jode:  STARB0AR0-STABILIZER 
NAME:  OBJECT 

CONE-DESCRIPTOR:  STARBOARD-STABILIZER-CONE 

Node:  PORT-STABILIZER 
NA,1E:  OBJECT 

C0NE-UESCR I PT0P :  PORT-STAB  I L  i  ZER-C0NE 
Subpart  tree  of  generic  pawenger  jet. 

Fig.  3. 


98 


Our  complete  model  for  a  generic  w|de-bodled  passenger 
jet  aircraft  has  28  quantifiers  describing  allowable  variations  in 
H2e  and  structure. 


Restriction  Nodes 

It  should  be  clear  that  to  model  a  subclass  of  wide-bodied 
passenger  jet  aircraft  we  need  only  provide  a  different  (more 
restrictive)  set  of  constraints  for  the  quantifiers  used  in  the 
general  model.  To  model  a  specific  type  of  aircraft  we  could 
,°r“  tbe  c°nstr:»ints  to  be  completely  specific  (eg.  (k  (V)  (-  V 
4  .  ,  '•  Thus  we  wil1  not  need  to  distinguish  between 
ipeaaluation  of  the  general  model  to  a  subclass,  or  an 
ndlvidual  (Note  that  the  notion  of  Individual  gets  very 
confused  here  Should  a  completely  specified  model  of  an 
,,  ™7,  or  747-B,  be  considered  an  Individual  or  a  class,  since 

•  here  are  multiple  instances  in  the  real  world?  By  not 
distinguishing  between  individuals  and  classes  at  the  model 
evel  we  finesse  the  problem.  If  we  were  basing  a  natural 

narnh|age  5y  rn  0Ur  rePrescntation  we  might  run  into 
problems  with  this.)  6 

Given  that  subclasses  use  different  sets  of  constraints,  the 
pro  em  arises  of  how  to  represent  multiple  subclasses 
imultaneously.  We  Introduce  a  new  type  of  node  to  the 
representation:  a  restriction  node.  These  ar.»  the  embodiment  of 
ipecialization.  Restriction  nodes  form  a  tree,  rooted  at  a 
distinguished  node,  the  BASE-RESTRICTION  Constraint  are 
,aif°f,ated  with  a  restriction  node.  A  restriction  node 
phcitly  inherits  all  the  constraints  associated  with  its  ancestors 
in  the  tree.  Thus  a  daughter  restriction  node  never  has  weaker 
constraints  than  its  parents. 

aircraft  of  ,he  &enerii:  "‘de-bodied  passenger  jet 

rrwrDir6  J°"ltraims  are  associated  w“b  some  restriction  node, 
ENERIC-jET-AIRCRAFT  say  Its  parent  would  most  likely 

rh  r  n  ASERESTRICTION  To  represent  the  class  of  7-i7s 
the  following  restriction  node  might  be  included: 

Node:  B0E:WG-747 
NAMEi 
PARENT: 

TYPE: 

CONSTRAINTS: 


RESTRICT  JN 
GENEri I C- JE  T-A I RCRAF  T 
MODEL-SPECIALIZATION 
<ll8t  of  constralnt8> 


fhe  CONSTRAINTS  slot  would  be  filled  with  the 
constraints  addifc.oal  to  those  in  CENERIC-JET-AIRCRAFT 
necessary  for  representing  the  sub-class  of  Boeing  747s.  This 
restrict.'on  might  in  turn  have  daughter  restriciions'to  represent 
further  subclasses  such  as  SP-747s  and  747-Bs.  P 

ulannPWheT?r  1  nodel  iS  accessed  <by  ^e  predictor  and 
planner  say),  is  is  accessed  in  the  context  of  a  restriction  node. 

lirrL  Z  reaS°nl‘,g  about  the  Sen{rlc  class  of  wide-bodied 

TFT  injnoeAr|redlCt0r  3nd  Planner  wi"  acceu  the 
JET-AIRCRAFT  model  and  base  its  reasoning  on  the 

constraints  given  by  the  GENERIC-JET-AIRcSaFT 
restriction  node.  When  reasoning  about  Boeing  747s  it  will  base 
reasoning  about  the  JET-AIRCRAFT  model  on  the 
constraints  given  by  the  BOEING-747  restriction  node. 

Variable  Affixmmts 

Affixments  are  coordinate  transforms  between  local 


coordlnate  systems  of  objectJ  They  we  ^  of  a  rotat]on 

and  a  translation 

lm„n«?elir!  amxments  varT  over  a"  object  class.  For 
instance  in  the  generic  wlde-bodled  passenger  jet  aircraft  the 

position  along  the  fuselage  at  which  the  wings  will  be  aJachri 

.  llowable  variation  in  an  affixment  within  a  single  obiect 
affix™, ^  •,?"  ,  ar‘uUlated  0bjea  iS  mode,sd-  Variable 

relations^  *1  ^  be  USeful  f°r  modeling  allowed  spatial 

re  a uonships  between  two  objects  -  for  instance  an  aircraft  Is  on 
*■  runway. 

Notationall’,  we  represent  a  vector  as  a  triple  (a.b.c)  whe.e 
a.  b  and  e  are  sea  irs.  We  represent  a  rotation  as  a  pair  <v,m> 

VeC10r’  and  m  a  Scalar  mafnitude  An  affixment 
vv  ll  be  written  as  a  pair  r.t)  where  r  is  a  rotation  and  t  a 

and  VC  VeCt°r'  We  W1"  USe  SOrne  SPedal  vectors  also:  x.  Q 
We  use  *  for  the  composition  of  rotations,  and  •  for  the 
application  of  a  rotation  to  rx  vector 

units  wirAhC!!2NYM  itSe,f'  by  ‘reating  VeCtorJ  and  rotations  as 
reoresen  h3fr  We  Ca"  the  <>uamif'ier  mechanism  to 

transform  T'™  Whkh  d®5Cribe  a  da“  of  coordinate 
ransforms.  (It  turns  out  that  in  the  ACRONYM 

forP  mrl?'10"  “  ‘J  ^  t0  eXtend  the  «“ow«bie  fillers 

m.  r  and  translatl°ns  to  expressions  Involving 

quantifiers.)  This  gives  symbolic  representations  for  rotation! 
ana  translations. 


Node:  RUNWAY 
NAMF: 

CONE-OESCRIPTOR: 


OBJEH 

Z0025 


Node:  Z0025 
NAME: 

SPINE: 

SWEEPING -RULE: 
CROSS-SECTION: 

Node:  Z0026 
NAME: 
TYPE: 
LENGTH: 


SIMPLE-CONE 

Z0026 

CONSTANT-SUEEPING-RULE 

Z0027 


SPINE 

STRAIGHT 

RUNWAY-LENGTH 


Node:  CONSTANT-SUEEPING-RULE 


NAME: 
TYPE: 

Node:  Z0027 
NAME: 
TYPE: 
WIDTH: 
HEIGHT: 


SWEEPING-RULE 

CONSTANT 


CROSS-SECTION 

RECTANGLE 

RUNUAY-UIOTH 

0,0 


Model  of  runway 
Fig.  4. 

»irrraf^°|W  CC‘nSide.r  the  Problem  of  representing  the  fact  that  ui 
...  ” f  t‘s.  somewhere  on  a  runway.  Suppose  the  runway  Is 
presented  as  in  figure  4.  m  this  case  Its  coordinate  system  will 

fj,/  X  **ls  centered  a!°r,g  the  len«th  of  the  runway,  the  y 
,  perpendicular  at  one  end,  and  the  positive  z  direction  will 

.h.  S3  T “•?*  op  s“pp<”  1" 

the  ajrc.aft  has  its  x  axis  running  along  the  spine  of  the 


fuselage,  and  has  its  z  axis  skyward  for  the  standard  orientation 
of  an  airplane.  Thus  to  represent  the  aircraft  being  on  the 
runway  we  could  affix  it  with  the  following  affixmenL 


99 


tonstraints  generated  from  knowledge  of  the  conditions  at  the 
lime  the  photographs  were  taken. 


<<£,  ORI),  (JET-RUUAY-X,  JET-RUNUAY-Y,  0)) 

where  ORi,  JET-RUNWAY-X  and  JET-RUNWAY-Y  are 
quantifiers  with  the  following  constraints: 


When  the  predictor  and  planner  needs  to  deduce  the 
position  of  an  object  relative  to  the  camera  coordinates  it  will 
need  to  invert  the  above  coordinate  transform.  This  can  be 
achieved  by  a  simple  set  of  manipulation  rules,  and  in  this  case 
ihe  result  would  be: 


JET-RUNUAY-X:  tt  (V) 

(INTERVAL  V  0.0  RUNUAY-LENGTH) ) 
JET-RUNUAY-Y:  (A  (V) 

(INTERVAL 

V 

(TIKES  RUNUAY-U1DTH  -0.5) 

(TINES  RUNWAV-U1DTH  0.5))) 

These  constrain  the  aircraft  to  be  on  the  runway,  in  the 
normal  orientation  for  an  airplane  (i.e.  not  upside  down  or  any 
such),  but  it  does  not  constrain  the  direction  in  which  the 
aircraft  is  pointed,  if  we  wished  to  constrain  the  aircraft  to 
approximately  line  up  in  the  direction  of  the  runway  we  could 
include  a  constraint  on  the  quantifier  ORI,  allowing  for  some 
irnall  error. 

Partially  Specified  Camera  Model  <  • 

If  we  are  to  predict  the  appearance  of  objects  it  will  help 
lo  have  some  model  of  their  position  and  orientation  relative  to 
the  camera.  We  have  chosen  to  use  a  world  coordinate  sysiem 
into  which  both  objects  and  camera  are  placed  via  affixment 
lype  coordinate  transforms.  This  allows  the  user  modelling  a 
tcene  to  think  in  familiar  terms.  The  affixments  so  used  are  of 
course  allowed  to  include  quantifiers  in  their  specification. 

The  camera  has  a  local  coordinate  system  where  focal 
point  is  centered  at  the  origin,  the  viewing  direction  is  along  the 
negative  z  axis,  and  the  top  of  the  image  screen  is  In  the 
positive  y  direction. 


Suppose  we  are  modeling  the  situation  where  the  camera 
is  onboard  an  aircraft  flying  over  some  scene  and  the  camera  is 
pointed  directly  downwards.  Suppose  further  that  the  exrct 
height  of  the  aircraft  is  unknown,  if  the  objects  on  the  ground 
are  modeled  with  unknown  ground  coordinates  (there  may  well 
be  mutual  constraints  between  the  quantifiers  representing  the 
positions  of  the  objects  on  the  ground)  then  we  can  choose  a 
world  coordinate  system  so  that  the  camera  is  directly  over  the 
origin,  with  its  x  and  y  axes  directly  over  the  world  x  and  y 
axes.  Then  letting  HEIGHT  be  the  quantifier  representing  the 
height  of  the  camera  above  ground  we  can  affix  the  camera  to 
the  world  with: 

(i.  (0,  0,  HEIGHT)) 


where  i  is  the  identity  rotation.  The  quantifier  HEIGHT  might 
be  constrained  according  to  some  a  priori  knowledge  of  the 
conditions  under  which  the  photographs  to  be  analyzed  were 
taken.  If  we  want  to  include  the  effects  of  roll  and  pitch  of  the 
aircraft  carrying  the  camera  (yaw  has  already  been  taken  care  of 
by  our  choice  of  world  coordinates)  we  could  apply  two  rotations 
to  the  above  affixment,  obtaining  the  expression: 


(<£#  PI TCH>#<fl,  R0LL>f  (0,  0,  HEIGHT)) 

Again,  the  quantifiers  PITCH  and  ROLL  will  have 


(<y.  -R0LL>*<x,  -Pi TCH>, 

<U.  -ROLL  >#<><,  -PI TCH>® (0,  0,  -HEIGHT)) 


3.  Geometric  Reasoning  and  Case  Analysis 

invariant  observables  are  image  features  which  will  be 
observable  over  all  possible  viewing  conditions  (e.g. 
collinearities,  *  connectivity  stemming  from  three  dimensional 
connectivity).  Quasi-invariants  are  those  that  will  be  observable 
over  the  possible  range  of  variation  within  the  modeled  scene 
(e.g.  the  fuselage  of  an  aircraft  on  the  ground  will  appear  as  a 
straight  ribbon  from  aerial  images  taken  at  any  altitude). 

The  predictor  and  planner  needs  to  be  able  to  detect 
when  quasi-invariants  are  available.  Thus  it  must  be  able  to 
-eason  about  the  spatial  relationship  between  the  camera  and  a 
model.  This  relationship  will  usually  be  only  partially  specified, 
and  will  involve  a  number  of  quantifiers.  Often 
quasi-in  variants  will  not  be  directly  available.  Instead  the 
predictor  and  planner  will  need  to  break  up  the  possible  spatial 
relationships  into  sub-cases  where  there  are  quasi-invariants. 
Again  it  will  need  to  reason  about  a  chain  of  partially  specified 
coordinate  transforms,  in  addition  it  will  need  to  produce 
observability  *edictions  for  the  different  sub-cases,  where  the 
54  constraints  on  quantifiers  have  been  specialized  further  than  in 
the  modeled  situation. 

Reasoning  About  Coordinate  Transforms 

The  orientation  and  position  of  an  object  relative  to  the 
camera  will  be  given  by  a  series  of  affixments,  which  are  pairs 
of  rotatiens  and  translations.  The  final  orientation  will  be  given 
by  the  product  of  the  rotational  components  of  these  affixments. 
The  relative  orientation  is  important  for  making  predictions  of 
ihe  appearance  of  the  object.  The  position  is  given  by  a  sum  of 
applications  of  rotation  expressions  to  individual  translation 
vectors.  The  positions  of  objects  are  important  for  determining 
whether  they  are  visible  and  for  the  predicting  spatial 
relationships  between  objects  in  the  image. 

In  this  section  we  will  only  deal  with  methods  for 
manipulating  and  understanding  products  of  rotations.  These 
methods  are  symbolic  in  nature  and  allow  the  predictor  and 
planner  to  reason  about  rotations  which  are  expressed  in  term 
of  quantifiers.  We  are  working  on  methods  for  dealing  with  the 
results  of  applying  rotations  to  vectors;  the  other  half  of 
understanding  compositions  of  affixments.  We  have  developed 
tome  rules  for  this,  but  will  report  on  them  at  a  later  date  after 
further  work. 

Rotations  of  three  space  form  a  group  under  the 
operation  of  composition.  They  are  associative  but  not 
commutative.  Commutivity  would  help  greatly  in  simplifying 
products  of  rotations  as  it  is  easy  to  compose  rotations  which 
share  their  axis,  into  a  single  rotation.  There  is  a  slightly  weaker 


property  of  three  dimensional  rotations  however.  Its  proof  is 
straightforward  but  tedious,  and  is  omitted  here.  Let  and  v2 
be  two  three-space  vectors  and  and  m2  be  two  scalars.  Then 
the  following  two  identities  are  true; 

<v1(  m1>*<v2,  m2> 

-  <v2,  m2>*<(<v2,  -m^Vj) ,  mA> 

<v1(  m1;>*<v2,  m2> 

«  <(<v1(  m:>®v2) ,  m2>*<v1,  m1> 

These  will  allow  us  to  "shift”  rotations  both  to  the  left  and 
to  the  right.  The  only  problem  is  that  as  a  rotation  is  shitted  it 
leaves  rotations  with  complex  axis  expressions  in  its  wake. 
There  is  a  subgroup  of  rotations  for  which  these  axis 
expressions  are  no  more  complex  than  the  original.  This  is  the 
group  of  24  rotations  which  permute  the  x,  y  and  z  axis 
amongst  themselves  and  their  negations.  When  they  are  used 
with  the  above  identities  the  new  axis  expression  is  a 
permutation  of  the  original  axis,  with  perhaps  some  sign 
changes. 

We  will  be  particularly  interested  in  a  subset  of  that 
rotational  subgroup.  It  consists  of  the  identity  rotation  i,  and 
rotations  about  the  three  coordinate  axes  whose  magnitudes  are 
multiples  of  n/2.  We  write  them  x  ,  x2,  *3,  y2,  y3,  and 

z3>  The  subscript  indicates  the  magnitude  of  the  rotation  as  a 
multiple  of  n/2.  We  will  calr.  these  ten  rotations  etementary.  It 
turns  out  that  they  are  very  commonly  used  rotations  in 
modeling  man-made  scenes.  Rotation  expressions  for  the 
orientation  of  an  object  will  often  be  chiefly  composed  of 
elementary  rotations, 

Elementary  rotations  are  closed  under  the  identities  given 
above.  For  example; 

X3*yi  “  yi*Z3 
*3*yi  *  Z3*X3 

Using  the  general  identities  given  above  we  ran  simplify 
expressions  of  products  of  rotations  which  include  elementary 
rotations  by  "moving"  them,  and  multiplying  out  adjacent 
elementary  rotations  which  share  the  same  axis.  In  particular, 
using  the  following  five  simplification  rules  we  can  remove  all 
but  at  most  two  elementary  rotations  from  a  product  of 
rotations.  There  will  be  at  most  one  elementary  rotation  at  the 
left  of  the  expression,  which  will  be  one  of  z,  or  z3.  At  the 
right  of  the  expression  there  may  be  one  of  x2,  x3,  or  y3> 
The  other  rotations  will  merely  have  had  the  components  of 
their  axis  of  rotation  permuted  and  perhaps  negated. 

SRI:  Compose  adjacent  elementary  rotations  sharing 
the  same  axis  of  rotation. 

SR2:  Move  instances  of  z  ,  z2and  z3  to  the  left 
applying  SRI  whenever  possible. 

SR3:  Move  the  left-most  x-axis  elementary  rotation 
to  the  right  until  i:  is  adjacent  to  another,  or  it  is  the 
nght  TTiort  rotation  ft i  Hie  exprwjtuii.  Mwte  each 
z-axis  elementary  rotation  which  may  have  been 
introduced  to  the  left  of  the  expression.  Apply  SRI 
to  the  result  wherever  necessary. 


SR4:  Move  elementary  y-axis  rotations  to  the  right, 
stopping  before  any  elementary  x-axis  rotation 
which  might  be  there.  Apply  SRI  whenever 
necessary. 

SR5:  Make  substitutions  at  the  right  of  the 
expression  using  th*  following  identities  and  shift 
any  introduced  elementary  z-axis  rotations  to  the  left 
of  the  expression,  applying  SRI  wherever  necessary. 

yi*xi  “  z3*yr  l‘i*x2  “  Z2*yT  ui*x3  “  zi*yi 

y3*Xl  ■  Zl*y3’  y3*X2  "  Z2*y3’  yi*X3  "  Z3*y3 
y2  "  Z2*X2 

As  an  example  (albeit  more  complex  than  is  usually  found 
in  aerial  images  -  the  following  orientation  expression  comes 
from  a  ground  level  camera  with  small  TILT  and  PAN, 
viewing  a  subpart  of  a  complex  object  which  is  oriented 
upright,  but  arbitrarily  on  the  ground)  consider  the  following 
rotation  expression; 

<x,  -T I  LT>*<y ,  PAN>*z3*y3*<£,0Rl  >^y3*y1*y1 

When  the  above  rules  are  applied  it  simplifies  to: 
z3*<y, -TILT>*<x, -PAN>*<x,-0RI> 

The  reason  for  wanting  the  particular  form  for  the 
rotation  expression  that  the  above  rules  supply  is  twofold.  A 
left-most  rotation  about  the  z  axis  corresponds  to  a  rotation  of 
the  image  plane.  Thus  it  can  be  ignored  for  the  purpose  of 
predicting  appearances.  The  possibilities  for  the  right-most 
elementary  rotation  (in  the  above  case  it  is  1,  the  identity) 
correspond  to  the  six  views  of  a  generalized  cone  from  along 
each  axis  ray  in  its  local  coordinate  system.  Many  cones  will  be 
lymmetric  with  respect  to  the  x-a^is  rotations,  as  it  corresponds 
to  rotation  about  the  spine,  and  yj  and  y3  correspond  to 
viewing  the  cross  section  at  each  end.  Of  course  the 
non-elementary  rotations  remaining  may  cause  problems.  If  the 
above  expression  is  being  used  to  predict  the  appearance  of  a 
cylinder,  then  all  the  right  most  rotations  about  the  x  axis  can 
be  ignored(  Cylinders  are  invariant  with  respect  to  rotations 
about  their  spine.  Thus  the  aoove  expression  could  be  treated 
ecuivalently  to: 

<a,-TILT> 

For  highly  constrained  TILT,  this  too  could  be  ignored. 
We  will  not  always  be  so  lucky,  but  there  will  often  be 
substantial  simplifications. 

Consider  as  a  second  example  the  camera  orientation 
developed  at  the  end  cf  section  2  for  the  camera  with  pitch  and 
roll  and  included.  Suppose  it  is  viewing  an  airplane  sitting  on 
the  ground  with  some  arbitrary  orientation.  Let  ORI  be  the 
quantifier  which  corresponds  to  that  degree  of  freedom.  Then 
the  orientation  expression  for  the  rudder  will  be: 

<y,  -R0LL>*<£,  -PITCH>*<£,  0Rl>#y3 

The  five  simplification  rules  above  leave  this  expression 
Invariant.  Using  the  first  of  the  identities  given  at  the  beginning 
of  the  section  the  predictor  and  planner  can  symbolically  shift 
(he  unconstrained  rotation  to  the  left,  so  that  the  expression 

becomeer 

<£,  ORI >*<(<£,  -ORI >»0),  -R0LL> 

*<(<£,  -0RI>*$),  -PITCH>*y3 


101 


The  left  most  rotation  can  now  be  ignored  as  it 
torresponds  only  to  a  rotation  of  the  image  plane.  If  ROLL  and 
PITCH  are  constrained  to  be  reasonable  then  the  appearance  of 
Ihe  airplane  will  be  quasi-invariant,  as  effects  from  the  two 
remaining  rotations  will  be  dependent  on  the  cosines  of  ROLL 
and  PITCH  which  will  be  close  to  one.  Thus  the  view  of  the 
rudder  will  be  that  given  by  rotating  it  by  y3  -  the  view  from  its 
spine  looking  back  at  its  top  cross  section. 

Case  Analysis 

For  a  given  object  model  with  a  given  range  of  possible 
sizes,  structures  and  orientations,  it  may  not  be  possible  to  find 
adequate  quasi-invariants  to  predict  the  appearance  of  the 
object.  However  by  splitting  up  the  range  of  allowed  values  for 
some  quantifier,  it  will  often  be  possible  for  the  predictor  and 
planner  to  produce  subcases  where  quasi-invariants  do  exist. 

Consider  for  example  images  of  oil  tanks  taken  from  a 
downward  looking  camera  from  a  relatively  low  altitude  aircraft. 
Depending  on  theif  lateral  distance  from  the  aircraft  the  tanks 
will  appear  as  circles,  or  as  ellipses  with  short  parallel  sided 
ribbons  connected  to  them.  Rules  to  make  such  predictions  are 
Included  in  the  predictor  and  planner.  Further,  both  predictions 
tan  be  made  if  they  are  predicated  on  the  value  of  the  lateral 
distance.  To  to  this  new  restriction  nodes  are  introduced.  They 
provide  constraints  on  the  appiopriate  quantifiers,  breaking  the 
possibilities  Into  cases.  Predictions  are  attached  to  the 
appropriate  restriction  node.  The  following  restriction  nodes  are 
typical: 

Node;  20137 

NANEi  restriction 

PARENT  i  GENERIC-OIL-TANK 

TYPE:  OBSERVABILITY-CASE-ANALYSIS 

CONSTRAINTS:  (POS-X:  (X  (V) 

{>  {+*  {*$  Y  Y) 

{*%  POS-Y  POS-Y) ) 
1.0E6) ) 

POS-Y:  (X  (V) 

[>  ■(+*  1*1  V  V) 

(*%  POS-X  POS-X) ) 

1«  0E6) ) ) 

Node:  Z0138 

NAME:  RESTRICTION 

PARENT:  GENERIC-OIL- TANK 

T  YPE :  OBSER VAB I L I T Y-CASE- ANAL  YS I S 

CONSTRAINTSi  (POS-Xi  (X  (V) 

t<  i+i  {**  v  v) 

(*|  POS-Y  POS-Y)) 
1.44E6) ) 

POS-Y:  (X  (V) 

(<  (+*  (*S  V  V) 

{*$  POS-X  POS-X)) 
1.44E6) ) ) 

The  first  node  is  for  the  case  of  the  oil  tank  being  greater 
than  1000  meters  from  the  center  of  the  point  of  view.  The 
tecond  for  less  than  1200  meters.  It  is  quite  alright  that  these 
restriction  nodes  are  not  mutually  exclusive,  as  they  will  be  used 
in  a  backward  mode  during  matching,  rather  than  a  forward 
mode.  That  is  to  say,  matches  f*r  the  oil  tank  model  witi  be 
hypothesized  from  some  data,  according  to  one  of  the  two 
observability  descriptions  attached  to  ihj  jJbrvf  zoourajm 


nodes.  The  appropriate  constraints  will  be  assumed  on  the  basis 
of  which  match  is  hypothesized.  By  not  allowing  hypothesis  of 
matches  which  lead  to  inconsistent  constraints  the  implications 
of  local  matches  are  propagated  to  enforce  global  consistency. 

4 .  From  2-D  back  to  3-D 

Previously  ([4],  [5],  [6])  we  have  described  the  general 
structure  of  the  observability  graph.  We  will  not  repeat  that 
here.  Instead  we  give  a  detailed  example  of  how  two 
dimensional  matching  can  be  used  to  understand  a  three 
dimensional  scene.  We  also  discuss  some  of  the  mechanisms  for 
combining  the  local  results  of  such  matchings  to  produce  a 
global  understanding  of  the  scene. 

Predicting  Uncertain  Size 

Consider  the  problem  of  predicting  the  length  of  the 
ribbon  which  will  correspond  to  the  fuselage  of  an  aircraft  as 
modeled  by  the  generalized  cone  of  figure  2.  Suppose  that  the 
image  is  an  aerial  view  taken  by  a  camera  at  a  height 
represented  by  the  quantifier  HEIGHT. 

A  simple  approach  (and  that  previously  used  in 
ACRONYM)  is  to  calculate  the  extreme  values  allowable  for 
Ihe  FUSEL  AG  E-LENCTH  and  HEIGHT  and  hence  calculate 
an  upper  and  lower  bound  on  the  possible  length  of  the  ribbon 
In  the  image.  Then  at  match  time  ribbons  whose  length  falls 
within  this  range  (and  similarly  for  the  width  and  taper)  are 
accepted  as  candidate  fuselage  matches.  Later  constraints  about 
spatial  relations  with  other  ribbons  are  used  to  confirm  or  reject 
the  match. 

This  is  unsatisfying  however  as  such  local  matches 
provide  no  clue  to  the  three  dimensional  size  of  the  object,  nor 
ihe  height  of  the  camera.  A  match  where  the  ribbon  is  at  the 
smaller  end  of  the  available  range  implies  that  HEIGHT  is 
really  constrained  to  its  larger  values.  Thus  a  match  for  a  wing 
which  lies  near  the  upper  end  of  possible  lengths  for  that 
ribbon  would  be  inconsistent  as  it  would  Imply  that  HEIGHT 
really  takes  or  one  of  its  lower  possible  values. 

We  therefore  associate  a  restriction  node  with  each 
potential  match.  The  TYPE  slot  of  the  node  distinguishes  it 
from  other  types  of  restrictions  (HYPOTHESIS-MATCH  is 

Tht  par,  .:!  r^stricthm  fs  mat  asvuaakd  With  the 
observability  node  being  matched.  The  match  itself  provides 
new  constraints  on  some  quantifiers.  If  these  constraints  are 
inconsistent  with  those  provided  by  the  parent  rejiTSCtJih  i\iin 
the  match  is  immediately  rejected.  Otherwise  the  constraints  are 
attached  to  the  new  restriction  node.  Later,  as  various  local 
matches  are  combined  these  constraints  are  checked  for 
consistency  to  see  whether  tue  local  matches  are  globally 
tc*  t  if  i  to. 

We  now  work  through  the  details  of  the  fuselage  example. 
For  a  projective  Imaging  system  the  observed  distance  m 
between  two  points  distance  1  apart  on  the  ground  Is  given  by: 

cl 

m  j  — 

h 

where  c  is  a  constant  dependent  on  the  focal  distance  of  the 
camera  and  n  is  the  height  of  the  camera  above  ground.  We 
will  use  the  symbols  I  for  FUSELAGE-LENGTH  and  h  for 
HEIGHT  ft#r  btevicy  m  the  fbftvWtrtg  equfcnjiu.  The 


102 


constraints  from  the  restriction  node  associated  with  the 
observabihty  node  may  implicitly  provide  upper  and  lower 

rules  whn-1.  *hese.quantifiers-  The  predictor  and  planner  has 
les  which  examine  sets  of  constraints  and  try  (heuristically)  to 
deduce  such  bounds.  -Suppose  the  lower  bounds  are  I.  and  h. 
and  the  upper  bounds  are  lg  and  h  Then  the  simple 
acceptance  condition  on  m  is; 


'iC  >6= 

- £  m  -S  — 


wav  HoWev®r  we  can  interpret  the  above  equation  a  different 
Jn,y;  ,  "  “  the  observed  °r  measured  length  of  a  ribbon,  and  it 
does  indeed  correspond  to  a  fuselage  of  the  class  of  aircraft  we 

l  Of  rhl*?1"?  f0r’  then  the  fol,owin&  must  be  true  of  the  length 
I  of  the  fuselage,  and  the  height  h  of  the  camera;  8 


hjOl  hgfU 

—  SIS  — 


'ic  ■  >SC 
—  s  h  s  — 


able  l!  Weuknow  bounds  or‘  of  I  or  h,  then  we  are 

able  to  calculate  bounds  for  the  other  which  would  be  Implied 

y  accepting  a  match  to  the  prediction.  If  these  new  bounds  are 
rejected Wnt  ^  th°5e  kn°Wn  then  the  match  should  b* 

FUSELAGE-LENGTH: 

(A  (V) 

(INTERVAL  V 

(*8  (LOUERB  HEIGHT) 

0.0015) 

(»*  (UPPERS  HEIGHT) 

0.0015) ) ) 

HEIGHTi 

(A  (V) 

(INTERVAL  V 

(//*  (LOLl'RB  FUSELAGE-LENGTH) 
0.0015) 

(//8  (UPPERS  FUSELAGE-LENGTH) 
0.0015))) 

Constraints  generated  by  a  match. 

Fig.  S. 

micrht^f"6  5  g‘leS  examP,es  of  the  of  constraints  that 
m  g  ,  Seneraced  at  match  time  for  c  equal  to  20  and  a 
measured  length  of  the  ribbon  *  of  0.03.  Note  that  the 

geneme  hSI  prediC|‘i°n  itse,f  would  contain  some  code  to 
not  be  I  constraint  rather  than  the  constraint  itself  as  m  will 

UPPeL  and"  T  OWPrr^Cti°n  The  funaional 

nd  LOWERB  are  interpreted  by  the  constraint 

consistency  checking  mechanism  to  mean  that  upper  1 

bound,  for  ch  named  quantifier  should  be  deduced  from  th, 

other  constraints  known  if  possible.  Notice  too  that  the 

teamed  abouht0l|ld  Tl"  trU*  When  furcher  ‘"^mation  is 
aoDrooriare  h  °r  ^  ^  re-evaluating  these  constraints  at 
appropriate  times  we  will  be  able  to  determine  at  a  later  time 


whether  this  local  match  is  consistent  with  a  more  global 

dc*cNbtngt,0n"  Th'S  U  3  grMt  5trength  of  the  sy*em  we  are 

From  our  experience  with  carrying  out  actual 
Interpretations  we  find  that  any  predictions  should  allow 
generously  for  errors  (often  40-50*).  This  is  because  of  the  high 
error  rate  in  the  descriptive  processes.  Such  a  policy  means  that 
besides  the  oesired  matches,  many  new  Incorrect  matches  are 
made  at  the  local  level.  However  because  of  the  structured 
nature  of  the  models  and  the  resulting  inter-constrained 
observability  predictions  any  such  incorrect  matches  aretejected 
at  later  stages  due  to  global  inconsistencies.  We  have  not 
encountered  a  single  case  of  an  incorrect  match  surviving  to  the 
final  interpretation.  The  converse  is  not  true.  There  have  been 
a  number  of  cases  where  local  matches  have  been  missed,  even 
with  large  error  margins  allowed.  More  powerful  descriptive 
methods  would  help  this  problem. 

From  Local  to  Global 

■  them  mrnermnCh  ,0Cal  matchinS'  che  matcher  combines 

consisting  *  interpretations.  This  involves  finding 

consistent  subgraphs  of  matches.  Previously  consistency  has  only 

matched  bh  “‘w"!  °f  describi"g  relations  between 

ouarnmer  h  Tt!l  tl’e  lntroductio"  of  constraints  on 

be  cherieH  f  g  ribb°n  matching  process,  these  too  must 
oe  checked  for  consisrency. 

HYPOTHRkLtcu  a  quantlfler  at  different 
diffrr?nTHES  MATCH  restriction  nodes  may  actually  refer  to 
d.f  erent  quantities  in  the  scene.  For  instance  each  potential 

FmFT  4mrlTJirmft  may  have  constraints  on 
lheSmarAhCE'rENCTH  and  0n  HEIGHT'  When  combining 
ImaoT  Ik  °r  Craft  t0  pr0duce  an  interpretation  of  the 
pTiIpt  Trl ,  Sr“'tas0n  t0  requi‘^  that  the  constraints  on 

con!isttm  n  ffE  TH  3t  thCSe  diffe‘ent  r,0des  be  mutual|y 
consistent.  Different  instances  of  wide-bodied  passenger  let 

“ ,hm  ,s  - 

rnr  ^cmc'imes  when  constraints  on  quantifiers  actually 
rrespond  to  different  quantities  in  the  world,  it  may  be  that 

ENGINFI “rt™ld  have  the  same  valu,>  For  instance  the 
ENGINE-LENGTH  for  the  port  and  larboard  engines 

Lorw  Pud  t0  PhySlCal  measu  cements  of  different  objects  in  the 

given  bv  th™  ?' n“  a,rCraft  are  the  constraints 

given  by  the  matches  on  possible  values  of  ENGINE-LENGTH 

,®rale“b  *ngi"e  shou,d  be  consistent.  Thus  when  clumping  the 
from  f  an  airCraft  the  ENGINE-LENGTH  constraints 

submatch  should  be  checked  for  consistency.  If  they 
are  not  consistent  the  particular  set  of  local  matches  should  be 
rejected  as  inconsistent 


0M9  A  *  °C  is  P™vM«d  in  object  units  to  represent  which 
quant  fiers  matched  at  a  lower  level  should  be  held  consistent 

as  shnw^m1  n10n  °1  an  °bjeCt'  Th!s  ls  the  QUANTIFIERS  slot 
It  ,.r  lgUre  ?•  As  the  matcher  is  “mbining  local  matches 

OUANTiIrroc  SUbpar‘  tree'  AnV  qciantlfier  mentioned  in  a 
£  ANTIFIEPSslot  of  any  ancestor  of  the  object  has  Its 

r?oblia  H  C°AP  ed  int°  the  restriction  node  for  the  new  more 
=r  e-£,eaCh  Constraint  is  introduced  it  is  checked  for 
.om,ty'  Th  S  pr0C*“  is  not  quite  straight  forward  as 
ouTnf-neS  aJ°nstraJnt  on  a  quantifier  is  in  terms  of  another 
q  antifier  which  is  not  being  brought  into  the  new  match.  Such 


103 


[7]  Nevatia,  Ramakant  and  K.  Ramesh  Babu,  "Linear  Feature 
Extraction  and  Description,"  Proc .  of  IJCAI-79,  Tokyo,  Aug. 
1979,  639-641. 


is  the  case  of  FUSELAGE-LENGTH  and  HEIGHT  from 
figure  5,  when  a  global  interpretation  is  being  made  involving 
many  aircraft  The  constraints  on  HEIGHT  must  be  replaced 
with  ones  which  use  the  current  lower  and  upper  bounds 
determinable  for  FUSELAGE-LENGTH  for  each  aircraft 

5.  Conclusion 

We  have  given  details  of  our  representational  scheme  for 
generic  and  specific  objects  and  incompletely  specified  scenes. 
We  have  given  explicit  rules  to  help  with  geometric  reasoning, 
and  methods  for  representing  multiple  predictions  about  the 
tame  object.  We  have  shown  how  to  use  two  dimensional  match 
information  to  infer  three  dimensional  information.  All  these 
rely  heavily  on  the  representational  tools  of  quantifiers  and 
restriction  nodes.  We  have  described  three  uses  for  restriction 
nodes: 

(a)  producing  models  which  are  instances  or 
subclasses  of  class  models. 

(b)  providing  a  mechanism  for  case  analysis  in 
prediction. 

(c)  allowing  multipie  matches  against  single  models 
and  subsequent  independent  reasoning  about  each’ 
instance. 

The  three  types  of  restriction  node  are  tagged  and  hence 
distinguishable  when  necessary.  However  the  three  cases  involve 
many  similar  computational  problems  and  the  common 
representation  allows  the  reasoning  system  to  handle  these 
common  computational  problems  with  precisely  the  same 
mechanisms. 

We  have  demonstrated  how  constraints  from  individual 
local  matches  can  be  combined  to  produce  globally  consistent 
interpretations. 


References 

[1]  Binford,  Thomas  O.,  "Visual  Perception  by  Computer" 
Invited  paper  at  IEEE  Systms  Seine*  and  Cybernetics 
Conftunc* ,  Miami,  Dec.  1971. 

[2]  Bobrow,  Daniel  G.  and  Terry  Winograd,  "An  Overview  of 
K.RL,  a  Knowledge  Representation  Language,"  Cognitive  Seine e 
1,  1977,  3-46. 

[3]  Brooks,  Rodney  A.,  "Goal-Directed  Edge  Linking  and 

Ribbon  Finding,"  Proc .  ARP  A  Image  Understanding  Workshop 9 
Palo  Alto,  Apr.  1979,  72-78. 

[4]  Brooks,  Rodney  A.,  Russell  Greiner  and  Thomas  O. 

Binford,  "A  Model-Based  Vision  System,"  Proc.  ARP  A  Image 

Understanding  Workshop,  Cambridge,  May  1978,  38-44. 

15]  Brookr,,  Rodney  A.,  Russell  Greiner  and  Thomas  G. 

Binford,  "Progress  Report  on  a  Model-Based  Vision  System," 
Proc .  ARP  A  Image  Understanding  Workshop,  Pittsburgh,  Nov. 
1978,  145-151. 


IS]  Brooks,  Eodney  A.,  Russell  Greiner  and  Thomas  O. 
Binford,  "The  ACRONYM  Model-Based  Vision  System,"  Proc . 
tflJCAl-79,  Tokyo,  Aug.  1979,  105-113. 


A  STORAGE  REPRESENTATION  FOR 
EFFICIENT  ACCESS  TO 
LARGE  MULT I -DI MENS TONAL  ARRAYS 

Lynn  H.  Quam 
SRI  In  ter  national 
Menlo  Park,  California  94025 


ABSTRACT: 

This  paper  addresses  problems  associated 
with  the  access  of  elements  of  Irrge 
multi-dimensional  arrays  when  the  order  of  access 
is  either  unpredictable  or  U  orthogonal  to  the 
conventional  order  of  array  storage.  Large  arrays 
are  defined  as  arrays  which  are  larger  than  the 
physical  memory  immediately  available  to  store 
them.  Such  arrays  must  be  accessed  either  by  the 
virtual  memory  system  of  the  compeer  and  operating 
system,  or  by  direct  input  and  output  of  blocks  of 
the  array  to  a  file  system.  In  either  case,  the 
direct  result  of  an  inappropriate  order  of 
reference  to  the  elements  of  the  array  is  the  very 
time-consuming  movement  of  data  between  levels  in 
the  memory  hierarchy,  often  costing  factors  of 
three  orders  of  magnitude  in  algorithm  performance. 

The  access  to  elements  of  large  arrays  is 
decomposed  into  three  steps:  the  transformation  of 
the  subscript  values  of  an  n-dimensional  array  into 
the  element  number  in  a  1 -dimensional  virtual 
array,  the  mapping  of  virtual  array  position  to 
physical  memory  position,  and  the  access  to  the 
array  element  in  physical  memory.  The  virtual  to 
physical  mapping  step  is  unnecessary  on  computer 
systems  with  sufficiently  large  virtual  address 
spaces.  This  paper  is  primarily  concerned  with  the 
first  of  these  steps. 

A  subscript  transformation  is  proposed 
which  solves  many  of  the  order-of-access  problems 
associated  with  conventional  array  storage.  This 
transformation  is  a  based  on  an  additive 
decomposition  of  the  calculation  of  element  number 
in  the  array  into  the  sum  of  a  set  of  integer 
functions  applied  to  the  aet  of  subscripts  as 
follows: 

element-number (i,j , . . . )  «  fi(i)  +  fj(j)  +  ... 

The  choices  for  the  transformation 
functions  which  minimize  access  time  to  the 
elements  of  the  array  depend  on  the  characteristics 
of  the  memory  hierarchy  of  the  computer  system  and 
the  order  of  accesses  to  the  elements  of  the  array. 
It  is  conjectured  that  given  appropriate  models  for 
system  and  algorithm  access  characteristics,  then  a 
pragmatically  optimum  choice  can  be  made  for  the 
subscript  transfermation  fui  ctions.  In  general, 
these  models  must  be  stochastic,  but  in  certain 
cases  deterministic  models  are  possible. 

The  use  of  tables  to  evaluate  the  functions 
fi  and  f j  make  the  implementation  very  efficient 


using  conventional  computeis.  When  the  array 
accesses  are  made  in  an  order  inappropriate  to 
conventional  array  storage  order,  this  scheme 
requires  rar  less  time  than  for  conventional  array 
accessesing  schemes,  otherwise  the  accessing  times 
are  comparable. 

The  semantics  of  a  set  procedures  for  array 
access,  array  creation,  and  the  association  of 
arrays  with  file  names  is  defined.  For  computer 
systems  with  insufficient  virtual  memory,  such  as 
the  PDP-10,  a  software  virtual  to  physical  mapping 
scheme  is  given  in  appendix  III.  Implementations 
are  also  given  there  for  the  VAX  and  PDP-10  series 
computers  are  to  access  pixels  of  large  images 
stored  as  2-dimensional  arrays  of  n  bits  per 
e lement . 


INTRODUCTION: 

This  paper  addresses  problems  associated 
with  the  access  of  elements  of  large 
multi-dimensional  arrays  when  the  order  of  access 
is  either  unpredictable  or  js  orthogonal  to  the 
conventional  order  of  array  storage.  Large  arrays 
are  defined  as  arrays  which  are  larger  than  the 
physical  memory  immediately  available  to  store 
them.  Such  arrays  must  be  accessed  either  by  the 
virtual  memory  system  of  the  computer  and  operating 
system,  or  by  direct  input  and  output  of  blocks  of 
the  array  to  a  file  system.  In  either  case,  the 
direct  result  of  an  inappropriate  order  of 
reference  to  the  elements  of  the  array  is  the  very 
time-consuming  movement  of  data  between  levels  in 
the  memory  hierarchy,  of.en  costing  factors  of 
three  orders  of  magnituie  in  algorithm  performance. 

The  access  to  elements  of  large  arrays  is 
decomposed  into  three  steps:  the  transformation  of 
the  subscript  values  of  an  n-dimensional  array  into 
the  element  number  in  a  1 -dimensional  virtual 
array,  the  mopping  of  virtual  array  position  to 
physical  memory  position,  and  the  access  to  the 
array  element  in  physical  memory.  The  virtual  to 
physical  mapping  step  is  unnecsssary  on  computer 
systems  with  sufficiently  largs  virtual  address 
spaces.  This  paper  is  primarily  concerned  with  the 
first  of  these  steps. 

The  subscript  transformation  which  is 
conventionally  used  is  a  linear  combination  of  the 
subscript  values,  of  the  form: 

slement-number(i , j , k,  ...  )  =  a  +  b*i  +  c*j  +  . . . 


B¥S9*e 


105 


>. 

.•sssssr.rx*^  ”*?  °r 

t.i“utZZ‘Ta"  ”  tr*1™ is 

sum  or  a  aet  oAntegei  "lwl 13  the  a,I"v  into  the 


image  i  a  air'  * 

P.t  «.=»),  >ng  .  p.g,  "Jo"’"1’ 

*”p  °>  uoiu.u ,0f;„ 


to  64  pages.  For  an 


22= s- 

tt.sP&L  FrV -s 

“he  same  window  f?  nm  i  m  imuariy,  to  extract 
t:,ui.es  aeoea,  £-  "Hr 

achieve  widespread  ^T8/1'1^  BUat  be  addressed  to 
oriented  image  represents  arVS 

arrangement^of^the^pixels  °' 

transparency  of  use  t  ^ 

PROPOSED  SUBSCRIPT  TRANSFORMATION: 

subscript  value^to^he3"^0™8^0"  fr°m  array 

element-n umber  (i ,j ,  k,  ...)  =  fi(l)  +  fj(j)  +  ... 

the  different °c hoices  of  “the*?  in“stlate  some  of 
functions  for 

Example  I:  Storage  by  row,  ni  elements  per  row: 

element-number (i, j)  =  i*ni  +  j 

or  fi(i)  =  i*ni 

fJ(j)  =  j 

column:  ?  3 torase  by  column,  nj  elements  per  per 

element-number(i,j)  »  i  +  j*^ 

or  fi(i)  =  i 

fJ(j)  =  j*nj 

Example  3:  Storage  by  block:  ' 

it  ia  of^ua^l1^0^1  r°WS  Snd  nj  c°l™ns, 

rectangular  blocka  which ^  3rray  into 

relaw'to ^T1?' tM?j 

in  the  range  0/5^0  “^bytef  luT  1^°^ 

aX^o^ith 

proposed  here  » ,i!  r  U  the  ^Presentation 
padded  to  exactly  fin  oolurans  muat  be 

blocks  to  enable  the  ohjV”  egraP  number  of  array 
transformation  )  3ddltlVe  dec°'”P03ition  of  the 

in  row  oSte^^tbES**  316  St°red  in  blocks 
order,  then  foU owing  cal in  r°W 
element  number  within  arraj-  C0D,pute  the 


element-number (i , j, k, . . . ) 


fi(i)  -  fj(j) 


A  CASE  STUDY 


The  Storage  of  Large  Images 


c-e  of  £  slorale6  o°ff  S£  £*»  ^ 
common  pract.-ce  in  disitni  y  Ib  has  been 

i-.g.  analysis  S'* 

dimensional  arr»vQ  •  a  r  zwo 

order,  for  example  b/rows'inTlef  f. COnVentional 
to  bottom  manner  referred  to  act  "  °  r*ght’  top 

Algorithms  that  access  2*®.“,  I0W  °rder  • 

that  they  were  stored  can  efficiently  J®  hT*  0Idet 
very  large  ima^ec-  i  n  nciently  handle  even 

stored  on  seZ^VtZ^  ^  ^  ' 

algorithmJethiched0hnoteVrcrcessnthema^e  analysi3  . 

row  order.  Geometric  tmn  r  °  ,  •  ^lxe^s  ln  strict 

rotation  and  transposition^r™^10"3  SU°h  33  in,aee 
order  of  access  but  it  3  h?  °  Very  Predictable 
to  the  storage  „  e  '  Such  °rtho«°nal 

deterministic  models  for  “  9lg0rithma  have 
models  for  accessing  order. 

conventionally  stote^  otn°eap^e?peJ8byfeSe  ^ 

the  system  useq  nnv  ,  per  D^te*  Assume 

related  to  the  least  r-  r3placemenb  algorithm 
Under  those  conditio  *2^1"  ale°rithm. 

calculation  ^ 

if  the  entire  image  wouldnnf  r  ^  GVery  pixel 
memory.  On  a  VAX  on  4-  into  physical 

VMS  or  Berkeley  5nIX  “iS  7™™  eUhei 

and  a  cost  of  approximately  30^mnvZ6  °f  'K  byt®S’ 
page  fault  thi  4.  ,  y  billiseconds  ner 

onf  rn,“~ 16  Homs'  ^ 

systems  such  as  (CL  m?  Proceasing.  other 

produce  comparable'resuUs  rUnning  T°PS’2° 
edfe  suoh  as 

but^t'ctrte’mod'elled^as'  7^  dePendent. 

ZlulLZTZl  X^rSvin^86 

^rroomp^xrthMttheUally  th6S6  m6°haa  *“  ®re  far 

themselves^nd^introduo^other^nef  fia^b°rdtbmS 

severe  restrictions.  inefficiencies  or 

js,f  i^'itrdo“r;i:rrrv  th“ 

«u  ■  «r*STu' :rre”“ 


| 


106 


element-number ( i . j )  = 

(j  mod  bj) 
+  (i  mod  bi) 


(i  mod  bi)  *  bj 
(j  div  bj)  *  bi*bj 


+  j  div  bjj  ^  d l  u j 

+  (i  div  bi)  *  bi*b  j  *  (nj  div  bj) 

nr  fi(i)  =  U  ™od  *  bi  ,  \ 

+  (i  div  bi)  *  bi*bj  *  (nj  div  bj) 

fj(j)  =  (1  mod  bj) 

+  (_  div  bj)  *  bi*bj 

In  the  proposed  accessing  scheme,  the 
values  cl’  fi  and  fj  are  pre-calculated  over  the 
ranges  of  each  subscript  and  stored  in  tables, 
rather  than  calculated  on  every  access  -o  the 
array. 

Example  4*  Storage  by  nested  blocks: 

A  generalization  of  the  block  storage 
scheme  is  the  division  of  the  array  into  a  nested 
sequence  of  blocks.  At  each  level  k  in  the 
sequence  the  block  consists  of  bi^kj  x  bjLkJ 
sub-blocks  of  level  k-1.  Block  level  0  consists  of 
a  single  array  element. 

Assume  that  the  array  has  been  padded  to 
dimensions  ni  and  nj  which  have  as  factors  the 
desired  block  dimensions; 

ni  =  bi[ll  *  bi[2l  *  ...  *  bi[nkl 

nj  =  bj[l]  *  bj[2J  *  •••  *  Mink] 

define  pi[ol  ■  1.  pl[k]  =  Pifc']  ]  I 

and  pj[0j  -  1,  pj[k]  -  P0[k-*J 


then  fi(i)  “ 

sum  ((i  div  pi[k- 1  ] )  mod  bi[k])  *  pi[k-l]  *  PjM 
k=1 ,nk 

and  fj(j)  = 

sum  ((j  div  pj[k-1  ])  mod  b j [k] )  *  pi[k-l]  *  Pj[k-l] 
k-1 i nk 

An  interesting  caec  of  the  nested  block 
representation  is  the  hinary  case,  where  for  all  k, 
uilkl-bj[k]-2.  This  subscript  to  element-number 
transformation  would  cause  a  2-d  array  to  be  stored 
as  follows; 


o  1  4  5  16  17  20  21  64  65  68  69  . 

2  3  6  7  18  19  22  23  66  67  70  71  • 

8  9  1  2  1  3  24  25  28  29  7  2  73  76  77  . 

1011  1 4  1 5  26  27  30  31  74  75  78  79  • 

3  2  3  3  3  6  3  7  4  8  4  9  5  2  5  3  9  6  97  1  00  1  01  . 

34  35  38  39  50  51  54  55  98  99  102  103  • 

40  41  44  45  56  57  60  61  104  105  108  109  • 

42  43  46  47  58  59  62  63  106  107  110  111  • 


The  primary  advantage  of  the  binary  nested 
block  representation  is  that  it  is  fictionally 
isotropic  in  efficiency  of  access  at  all  scales, 
from  very  small  array  element  neighborhoods  to 
large  blocks  of  very  large  arrays.  This 
arrangement  is  useful  because  of  the  hierarchical 
atructure  of  computer  memory. 


ORDER  OF  ACCESS  AND  MEMORY  HIERARCHY: 

Nearly  all  computer  systems  have  at  least 
two  levels  of  memory  hierarchy:  the  main  memory  of 
the  CFU  and  a  disk  file  system.  Many  newer 
computers  have  an  additional  level  of  memory 
hierarchy  which  consists  of  a  high  speed  cache 
between  the  CPU  and  the  main  memory.  It  the  ne 
future  high  speed  .page  caches  build  from  bubble 
memory  elements  are  expected.  Most  of  these  levels 
of  memory  exist  because  of  tradeoffs  between  cost 
per  bit  and  access  time. 

It  is  conjecture:  that  given  a  model  for 
the  characteristics  of  the  memory  hierarchy 
together  with  a  model  for  the  order  of  accesses 
made  to  the  elements  of  an  array  stored  in  that 
memory  hierarchy,  then  a  pragmatically  °Pt““ 
choice  can  be  made  for  the  subscript  transformation 
functions  for  that  array.  Future  research  is 
needed  to  explore  this  conjecture. 

For  algorithms  whose  order  of  access  does 
not  depend  on  the  content  of  the  data  the  order  of 
access  is  almost  trivial  to  model.  The  order  o 
access  of  the  remaining  algorithms  must  he  modelled 
stochastically.  For  example,  the  order  access 
of  an  edge  following  algorithm  might  be  modelled  by 
the  statistics  of  a  random  walk.  However ,  a 
complete  model  for  the  order  of  access  to  the 
memory  is  quite  difficult  is  a  multi-process 
environment . 

Memory  hierarchy  might  be  adequately 
modelled  by  parameterizing  each  level  of  the 
hierarchy  in  terms  of  its  granularity  of  storage, 
t ransfer  latency  time,  and  transfer  bandwidth. 

In  the  absence  of  specific  models,  a  useful 
strategy  for  selecting  the  subscript  transformation 
tables  is  to  provide  directionally  isotropic  ame 
of  access  to  regions  of  any  size.  This  is 
important  in  order  take  advantage  of 
characteristics  the  memory  hierarchy,  such  as  the 
ability  to  cluster  many  pages  on  the  same  trace  ot 
the  disk,  and  many  tracks  in  the  same  cy1-l-"de’'  t0 
minimize  head  motion.  The  binary  nested  block 
scheme  implements  this  strategy  without  needing 
a  specific  model  for  the  memory  hierarchy. 


TRANSFORMATIONS  VIA  ACCESS  TABLE  MODIFICATION: 

A  useful  number  of  geometric 
transformations  on  arrays  can  be  accomplished  by 
modifying  the  fi,  fj  access  table* ,  without  any 
modification  of  the  array  itself.  These 
transformations  include  rotation  by  multiples  of  90 
degrees  transposition,  reflection,  scaling,  and 
translation.  For  instance,  transposition  is 
achieved  by  the  exchange  of  tables  between 
subscripts,  reflection  by  reversing  the  content  of 
the  table,  and  rotation  by  a  combination  of 
transposition  and  reflection.  Scale  change  and 
translation  are  accomplished  by  performing  a  linear 
mapping  on  the  reference  to  the  table  entries.  Is 
is  improper  to  linearly  map  the  actual  values  of 
the  table  entries. 


107 


IMPLEMENTATION: 

Good  programming  practice  requires  that 
algorithms  should  not  have  imbedded  in  them 
unnecessary  information  about  the  objects  which 
they  manipulate.  This  notion,  which  is  often 
called  encapsulation,  suggests  that  the  details  the 
storage  of  arrays  should  be  of  no  concern  to  an 
algorithm  which  manipulates  an  array.  Hence,  the 
following  functional  forms  of  access  is 
recommended : 

value  -  ge t-element (ar  ray ,  i,  j,  .  .) 
put-element (ar ray ,  i,  j,  ...  t  value) 
boolean  :=  is-element (ar ray ,  i,  j,  ...) 

Is-element  returns  true  is  the  subscripts 
are  within  the  bounds  of  the  array. 

-'uch  arrays  are  constructed  by  the 

function 

array  •=  new-array( type, size , ilb, iub, jib, jub,  ...) 

This  function  allocates  an  array  with 
subscript  bounds  determined  by  ilb,  iub,  ...  using 
an  implementation  dependent  default  strategy  to 
construct  the  tables.  Type  and  size  determine  the 
data  type  and  number  of  bits  of  the  array  elements. 

An  alternate  form  of  array  construction  for 
user  specified  transformation  tables-'  is: 


iile.  the  parameter  copy  determines  whether  the 
data  in  the  array  is  actually  t  the  new  file 
rather  then  renaming  the  temporajy  file. 


DISCUSSION  * 

The  proposed  additive  decomposition  scheme 
using  tables  to  evaluate  the  functions  fi  and  fj 
make  thf  implementation  very  efficient  using 
conventional  computers.  When  the  array  accesses 
are  made  in  an  order  inappropriate  to  conventional 
array  storage  order,  this  scheme  requires  far  less 
time  than  for  conventional  array  accessesing 
schemes ,  otherwise  the  accessing  times  are 
comparable  (see  appendix  III).  This  scheme  also 
allows  for  the  efficient  access  to  images  much 
larger  than  the  virtual  space  of  the  machine. 

i he  choices  for  these  transforma tion 
functions  which  maximize  the  memory  hit  ratio  for 
depend  on  the  characteristics  of  the  memory 
hierarchy  of  the  computer  system  and  the  order  of 
accesses  to  the  elements  of  the  array.  It  is 
conjectured  that  there  are  easily  obtained  models 
for  system  and  algorithm  access  characteristics, 
from  which  a  pragmatically  optimum  choice  can  be 
made  for  the  subscript  transformation  functions. 

A  number  of  useful  geometric 
transformations,  such  as  rotation,  transposition, 
translation,  and  scaling,  can  be  performed  by 
modifying  these  tables,  and  require  no 
modifications  to  the  storage  of  the  array. 


array  •-  new-a r ray( type , s ize , i-table  ,  j-table , . .  . ) 

The  bounds  of  the  array  are  determined  by  from 
the  array  bounds  in  the  user  specified  tables. 

The  bounds  of  an  array  can  be  accessed  by: 
get-bounds (array ,  size , i lb, iub , j lb, jub,  ...) 
where  ilb  iub,  ...  are  output  parameters. 

Input  ard  output  of  such  arrays  is 
accomplished  by: 

array  :=  get-ar ray (file-name ,  mode) 

Get-array  constructs  and  returna  an  pointer 
to  allow  access  to  the  array  file  on  file-name. 

Mode  specifics  the  permissible  accessmode 
(read-only  or  read-write)  for  references  to  the 
array.  It  is  assumed  that  no  pages  of  array  data 
are  actually  moved  between  virtual  memory  and  the 
file  until  requested  by  element  accesses  to  those 
pages..  Therefore  the  size  of  the  array  does  not 
signif icantly  affect  the  time  to  access  an  element. 

put-a rray( file-name ,  array,  copy) 

Put-array  has  the  semantic  effect  of 
creating  an  array  file  named  file-name  which 
corresponds  to  the  data  in  the  array.  The 
underlying  implementation  details  may  vary,  but  in 
the  cases  where  the  array  is  mapped  to  a  temporary 


he  semantics  of  a  set  procedures  for  array 
access,  array  creation,  and  the  association  of 
arrays  with  file  names  were  defined.  For  computer 
systems  with  insufficient  virtual  memory,  such  as 
the  PDP-10,  a  software  virtual  to  physical  mapping 
scheme  is  used.  Implementations  for  the  VAX  and 
PDP-10  series  computers  to  access  pixels  of  large 
images  stored  as  2-dimensional  arrays  of  n  bits  per 
element  were  presented  in  appendix  III.  Programs 
written  in  conformance  with  the  proposed  set  "of 
image  primitives  will  be  machine  and  operating 
system  independent  and  will  thereby  greatly 
facilitate  the  interchange  of  image  understanding 
programs . 


REFERENCES 

Reddy.  Raj  and  Greg  Gill  (1978).  "Representation 
Complexity  of  Image  Data  Structures",  Proceedings: 
Image  Understanding  Workshop,  May  1978,  pp  28-30 


APPENDIX  I. 

Implementation  Semantica 

This  appendix  contains  a  functional 
specification  written  in  a  mixture  of  ADA  and 
engliah  for  a  package  of  procedures  for 
2 -dimensional  array  acceaa  to  arrays  of  a  aingle, 
fixed  type.  Appendix  II  containa  a  more  complete 
example  of  how  a  subroutine  package  might  be 


implemented.  The  Intent  of  this  specification  is 
to  define  an  implementation  independent  set  of 
procedures  and  semantics  for  access  to  'ante 
arrays.  6 

TYPE  subscript  IS  INTEGER; 

TYPE  bigsubscript  IS  LONG  INTEGER; 

TYPE  xt ype  IS  -  the  type  of  the  array  elements 
TYPE  table  record  TS 

record 

lb,ub  :  subscript 

—  lower  and  upper  bounds  for  table 
En;6RECORDARRAY  (lb  "  Ub)  0F  blgsubscript , 

TYPE  table  IS  ACCESS  table_record  : 

TYPE  x  type  array_record  IS 

RECORD  itab  -table;-  i  transformation  table' 

-1,  ;  table;--  j  transformation  table 

elementtype  CONSTANT  x_type  id 
elementeize:  CONSTANT  xJtype'SIZE; 

--  number  of  bits  per  element 

The  remaining  fields  are  implementation  dependent. 
For  a  virtual  memory  implementation  a  pointer  to  an 
array  of  x-type  is  defined.  For  a  software  Lpped 

implementation  a  page  table  of  pointers  to  arrays  of 
x-type  are  defined.  arrays  or 

END  RECORD; 

TYPE  x  type,  array  IS  ACCESS  x_type_array  record 


FUNCTION  geteleraent( 
RETURN  x; 

PROCEDURE  put e lenient  ( 


FUNCTION  iselement  (  p  ;  in  X_t ype.  array; 

i,j:  IN  subscript) 

RETURN  BOOLEAN  IS 
RETURN 

1  IN  KANGE  P'i tab -lb  ..  p.itab.ub 
and  j  in  RANGE  p.jtab.lb  ..  p.jtab.ub 

FUNCTION  newxarray( 

:  IN  subscript; 

--  number  of  columns  and  rows 
itab,  Jtab  :  IN  table  :=  NULL) 

RETURN  x__type__array : 

.  ^hd-s  function  allocates  an  x  array  with 
mensions  si,  sj.  The  optional  transform  tables 
allow  the  user  to  specify  his  own  coordinate 
transform  strategy,  otherwise  an  CREATETRANSFORM 
uses  implementation  dependent  default  strategy  to 
construct  the  tables.  NEWXARRAY  calls  87 
KECKTRANSEORM  to  check  the  range  values  of  user 

thatn!n  Sb  ^  'jtab  transform  tables  to  guarantee 
that  ail  pofsoioie  accesses  using  them  will  be 
within  the  bounds  of  the  x__array. 


p  :  IN  x_type_array ; 
ijj  IN  subscript) 


P  :  IN  x_type_array ; 

•  IN  subscript; 
val  :  in  x) 


type  ACCESSMODE  is  (readonly,  writeonly.  readwrite) 

FUNCTION  xarrayinput( 

filename  :  STRING; 

mode;  a3sessmode  :  =  readonly) 

RETURN  x_type_ar ray; 

xar rayinput  constructs  and  returns  an 
xar raypointer  to  allow  access  to  the  array  file  on 
filename.  Node  specifies  the  permissible 
accessmode  for  references  to  the  array.  It  is 
assumed  that  no  pages  of  array  data  are  actually 
moved  between  virtual  memory  and  the  file  until 
requested  by  element  accesses  to  those  pages. 
Therefore  the  size  of  the  array  does  not 
significantly  affect  the  time  to  access  an  element. 


PROCEDURE  xar r ayoutput (p 

filename 

copy 


x__type_ar  ray ; 
STRING; 

BOOLEAN  :=FALSE)i 


XARRAYOUTPUT  has  the  semantic  effect  of 
creating  an  array  file  named  filename  which 

1°mnir!mF°tdf-t0  T16  data  in  alray  F'  The  underlying 
plementation  details  may  vary,  but  the  following 
effects  are  intended: 

if  p  resides  in  memory,  then  p  is  written  into  an 
ARRAY  file 

if  p  resides  on  a  temporary  file  and  copy=false 
then  the  temporary  file  is  renamed  to 
filename . 

if  p  resides  on  a  non-temporary  file  or  copy=true 
then  an  exact  copy  of  p  is  created  on 
filename . 


Xarray  File  Representation; 

,  j  The  precise  file  format  is  implementation 
dependent,  but  must  contain  sx,  sy,  xtypecode, 
xtab  y tab,  and  all  of  the  pages  of  the  array  data. 
The  following  format  is  recommended: 

A  word  is  defined  to  32  or  36  bits 
depending  on  the  host  computer.  Pgsiz  is  the 
length  of  a  virtual  memory  page  on  the  host 

of'thf  file  is  relative  to  the 

start  Of  the  file.  A  pointer  within  subfile  is 

relative  to  word  0  of  the  subfile. 

subfile 

word 

number 


0:  unique  identifier  for  this  subfile  format 
pointer  within  file  to  another  subfile 
2  sx  number  of  columns 

sy  number  of  rows 

4  xtypecode  -  a  unique  code  identifying  the 
data  type  x 

5'  xtypesize  -  the  number  of  bits  per  element 
in  data  type  x 

6  pointer  within  subfile  to  the  start  of  xtab 

7  pointer  within  subfile  to  the  start  of  ytab 

8:  pointer  within  subfile  to  the  first  page  of 

array  data 


]§| 

* — — 


"  - 


'  ' 


109 


FUNCTION  getelement( 


Appendix  II 

ADA  DEFINITIONS  FOR  2-D  -ACCESS  TO  LARGE  ARRAYS 

TYPE  subscript  IS  INTEGER; 

TYPE  bigsubscript  IS  LONG  INTEGER; 

TYPE  x_type  IS  --  whatever  type  you  want  here  for 
—  the  array  elements 

TYPE  table_record  IS 
RECORD 

lb,ub  :  subscript; 

--  lower  and  upper  bounds  for  table 
element;  ARRAY  (lb  ..  ub)  OF  bigsubscr ipt ; 

END  RECORD 

TYPE  table  IS  ACCESS  table^record ; 


TYPE  x_tvpe_ar  ray__body  IS  ARRAY  (bigsubscript)  of  x: 

FOR  x_type__array_body  USE  PACKING: 

--  to  maximize  storage  efficiency 

TYPE  xarraybodyrecord  IS 
RECORD  size  :  bigsubscr ipt ; 

element  :  x_type_  a rray_body( 0  ..  size); 

END  RECORD- 

TYPE  xarraybodyrecordpointer  IS 
ACCESS  xarraybodyrecord; 

xar raypagesize  :  CONSTANT  512;  —  for  example 

TYPE  xarraypage  IS 
RECORD 

data  :  x_type_a  rray^body  (0  ...  xarraypagesize-1 ) ; 
END  RECORD; 

TYPE  xarraypagepointer  IS  ACCESS  xarraypage; 

TYPE  x__type_array__record  IS 
RECORD  itab  :  table- —  i  transformation  table 

jtab  :  table; —  j  transformation  table 

elementtype.  CONSTANT  x_type  id; 
elementsize:  CONSTANT  xjtype'&IZE; 

--  number  of  bits  per  element 
virtual  :  BOOLEAN: 

--  TRUE=>  virtual  implementation 
CASE  virtual  OF 
WHEN  TRUE  => 

data:  xarraybodyrecordpointer; 

--  this  a  pointer  to  the  data 

WHEN  FALSE=> 
numpages :  INTEGER ; 

pagesize:  CONSTANT  INTEGER  xarraypagesize ; 

—  for  use  by  storage  manager 
extdata  :  filedescr; 

--  a  record  structure  for  the  file 
--  holding  the  data 
pagemap  ;  array  (l  ..  numpages)  of 
xarraypagepointer ; 

END  CASE; 

END  RECORD; 

TYPE  x_type_array  IS  ACCESS  xarrayrecord ; 


p  :  IN  x_type__ar  r  ay  ; 
i.j  :  IN  subscript) 

RETURN  x  IS 

BEGIN 

elenum:  bigsubsc r ipt ; 
pagnum  :  INTEGER: 

elenum  p . itab. e lement (i)  +  p. j tab.element( j) 
CASE  p. virtual  OF 

WHEN  TRUE  =>  RETURN(  p  .data.element(  elenum) ) ; 
WHEN  FALSEO 

pagnum  ;=  1  +  elenum  /  pagesize: 
elenum  :=  elenum  HOD  pagesize; 

IF  p.pagemap(  pagnum)  =  NULL  THEN 
pagefaul t(pf pagnum) ; 

END  IF; 

RETURN(p.pagemap( pagnum)  .data(elenum) ) ; 

END  CASE: 

END; 


PROCEDURE  putelement( 

P 

:  IN 

x_type_array 

i*  3 

:  IN 

subsc  rip  t ; 

BEGIN 

elenum:  bigsubsc r ipt ; 
pagnum  :  INTEGER: 

val 

:  IN 

x)  IS 

elenum  :=  p . itab. element (i )  +  p.j tab.element( j) ; 
CASE  p. virtual  OF 

WHEN  TRUE  ->  RETURN(  p  .data. element  (elenum) ) ; 

WHEN  FALSE=  > 

pagnum  :=  1  +  elenum  /  pagesize; 

elenum  : =  elenum  MOD  pagesize; 

IF  p. pagemap ( pagnum)  =  NULL  THEN 
pagefault(p , pagnum) ; 

END  IF; 

p- pagemap(pagnum)  .data(elenum)  ;=  val; 

END  CASE: 

END; 

The  procedure  PAGEFAULT  is  not  defined 
here,  since  it  will  normally  be  implementation 
dependent.  The  intent  is  that  space  will  be  found 
large  enough  to  store  one  xarraypage,  and  a 
corresponding  xarraypagepointer  will  be  stored  into 
P • pagemap( pagenumber  ) .  Usually,  this  will  require 
searching  a  global  table  of  pages  to  find  either  a 
free  page,  or  the  least-recently-used  page  which 
will  be  "kicked-out". 


APPENDIX  III 

Machine  Specific  Implementations  of  Image  Access 

We  now  consider  implementations  of  this 
scheme  for  both  the  VAX  and  the  fCL-10,  with  two 
different  assumptions:  whether  the  entire  image 
will  or  will  not  fit  into  an  acceptable  fraction  of 
the  computer’s  virtual  memory  apace. 

The  implementei  is  encouraged  to  expend 
maximum  effort  to  make  the  two  element  acceas 
functiona  GETELEMENT  and  PUTELEMENT  as  efficient  aa 
possible,  either  through  machine  coded  subroutines, 
in-line  code  generation  from  within  the  host 
programming  language,  or  code  generation  by  the 
NEWXARRAY  generation  procedure.  If  they  arc 
efficiently  implemented  ,  then  researchers 
developing  new  algorithms  will  not  be  tempted  to 


no 


resort  to  special  representations  or  coding  tricks 
just  to  get  more  efficiency,  and  the  resulting 
programs  will  be  clearer  and  more  transportable  to 
any  sites  which  implement  these  semantics.  A 
proper  implementation  of  these  access  primitives 
should  be  as  efficient  as  most  host  languages  will 
produce  for  standard  2-d  array  access. 

We  will  first  consider  a  VAX  virtual  memory 
implementation  fcr  variable  pixel  sizes.  A 
MAINSAIL  language  definition  of  pixel  access 
using  the  VAX  variable  bit  field  manipulation 
primitives  could  be  written  as: 

INTEGER  PROCEDURE  getpix( 

P0INTER( image)  p: 

integer  x .y) : 

RETURN( ext rac t-f iela( 

p-bpp*(p.xtab[x]+p.ytab[y]  ), 
p.bpp,  p.picbase)): 

PROCEDURE  putpix( 

POINTER(pix)  p; 

INTEGER  x ,y , v ) ; 
insert-field(v . 

P  bpp*(p.xtab[x]+p.ytab[y]  ), 
p.bpp,  p.picbase): 

where  extract-field  and  insert-field  correspond  to 
the  VAX  variable  bit  field  instructions. 

A  pair  of  very  efficient  VAX  assembly 
language  code  sequences  to  access  pixels  using  the 
MAINSAIL  record  storage  conventions  can  be  written 
as  follows.  These  sequences  assume  that  the 
pointer  to  the  picture  record  is  in  register  RP, 
and  the  image  coordinates  are  in  registers  RX  and 
RY  respectively. 

GETPIX:  ADDL3  ©XTABO(RP) [rx] ,@YTAB0(RP)  [ry]  ,  RTMP 
;  computes  pixel  number  in  image 
MULL2  BPP(RP) ,  RTMP 
:  computes  bit  offset 
EXT2V  RTMP, BPP(RP) , PICBASE(RP) , H 1 
;  extract  the  pixel  into  R1 

Note  that  XTABO  and  YTABO  are  offsets  into  the 
IMAGE  record  which  contain  the  virtual  origins  of 
the  XTAB  and  1TAB  tables.  That  is,  the  longword  at 
XTABO+X  corresponds  to  XTAB[x].  Also  note  that 
only  one  ADDL3  instruction  is  needed  to  compute  the 
pixel  offset. 

putpix:  ADDL3  @XTAB0(RP) [rx] , ©YTABO (RP) [ry] , RTMP 
;  pixel  number  in  image 
MULL2  BPP(RP),RTMP 
:  compute  bit  offset 
INSV  RVAL,RTMP, BPP(RP) , PICBASE(RP) 

■  insert  the  pixel  value  in  RVAL 

The  instructions  shown  in  these  procedures 
require  only  8  data  references  plus  18  bytes  of 
program  reference.  The  overhead  of  a  CALLS  type 
procedure  call  is  certain  to  be  more  costly  than 
the. pixel  access  itself.  Consequently,  it  is 
advised  that  either  in-line  code  be  generated  for 
pixel  access,  or  that  a  JSB  type  procedure  call 
passing  arguments  in  registers  be  used. 


The  MU LLP  multiply  instruction  can  be 
eliminated  by  multiplying  the  values  stored  in  the 
xtab  and  ytab  tables  by  bpp  as  follows: 

xtabM  _  xtabj^xl  *  bpp 
ytabfy]  __  ytabLy  *  bpp 

This  scheme  has  been  tested  and  requires 
about  9  microseconds  per  access  with  the  MULL3 
instruction  and  about  7  microseconds  per  access 
otherwise  For  accesses  to  byte,  word  or  longword 
pixels  the  EXTZV  instruction  can  be  replaced  with 
the  appropriate  MOV  instruction.  This  reduces  the 
pixel  access  time  to  about  4  microseconds. 


DEC  PDP-10  series  software  paged  implementation 


The  FDP-10  implementation  is  defined  in 
SAIL.  An  image  record  class  is  declared  as 
follows : 


class 


pix( 

INTEGER  sx:  ff 
INTEGER  sy:  § 
INTEGER  bpp;  ff 
INTEGER  ARRAY  xtab:  ff 
INTEGER  ARRAY  ytab;  ff 
INTEGER  ARRAY  ptltab;# 
INTEGER  ARRAY  pagtab;# 


INTEGER  pagsiz*  ft 
INTEGER  fileid :  ft 
INTEGER  gtpix;  ft 
INTEGER  ptpix:  ft 


image  width 
image  height 
bits  per  pixel 
x  mapping  table 
y  mapping  table 
byte  pointers  TABLE 
the  mapping  table 
words  per  page 
disk  file  id 
entry  to  code 
entry  to  code 


Each  image  has  a  page  table  in  its  image 
descriptor  record  whicn  encodes  for  each  page  the 
following  information: 

pagtab[page]  *  0  :  page  is  not  in  memory 

=  vpage  :  virtual  address  of  page 

Each  image  also  has  a  table  of  bytepointer s , 
one  for  each  pixel  in  an  image  page.  This 
table  can  be  shared  between  ali  images  with  the  same 
pixel  size.  This  table  is  constructed  as  follows: 

ptr_point(bpp,0,bpp-l)  +  4  lsh  18: 

Thi3  constructs  a  bytepointer  to  the  first  byte  in 
the  word  indexed  by  register  4  with  byte  size  bpp. 

for  i_1  step  1  until  pagsis  do 
begin  ptrtab[i]  _  ptr;  ibp(ptr ) ; end : 

where  IBP  increments  the  bytepointer  to  the  next 
byte. 


A  procedure  PAGEFAULT  is  used  to  manage 
the  paging  tables  and  perform  the  transfer  of  data 
between  the  virtual  memory  and  the  file  system.  It 
is  assumed  that  the  read  and  write  access  modes  are 
enforced  by  the  pagefault  procedure.  For  brevity, 
PAGEFAULT  is  not  defined  in  this  document. 

The  optimal  PDP-10  code  sequence  is 
obtained  by  generating  a  version  of  the  getpix  and 


£•&  ■  T:rT*7 "  '  '  ' 


Ill 


putpix  procedures  lor  each  image  using  the  These  optimised  procedures  are  attached  to 

following  PDP-’O  assembly  language  sequences:  the  image  record  fields  CTPIX  and  PTPIX 

respectively,  and  are  not  to  be  called  directly 

•  S^pix  assumes  the  registers  are  setup  as  follows:  from  SAIL,  but  instead  from  these  procedures* 

:  acl  :  x 


;  a  c2  :  y 

INTEGER  PROCEDURE  fietpix( POINTER(pix)  pic 

;  ac5  pointer 

to  pic  record 

INTEGER  X.Y): 

:  act:  return  address 

START! CODE 

:  result  is  returned  in  acl 

POP  P.6: 

r  etur  n  address 

POP  P, 2: 

y 

GTPIX:  K07E 

4, XTABBASE( 1 ) 

POP  F, 1 : 

X 

ADi; 

4,YTARBASE(2) 

:  element  number 

POP  P,5 

pic 

HLRZ 

2,4 

•  offset(see  note  l) 

JRST  @gtpixoffset{4) 

SKIPN 

4,«APTABBASE(4) 

;  get  virtual  addr 

END* 

PUSKJ 

P ,PAGEEAULT 

:  page  not  there 

LDB 

1 ,PTRTABBASE(2) 

;  get  the  pixel 

INTEGER  PROCEDURE  getpix(POINTER(pix)  pic 

*  byte  pointers 

index  by  AC4 

INTEGER  x  y.val): 

JRST 

@6 

START 'CODE 

POP  P,6; 

return  address 

;  ptpix  assumes 

the  registers  are  setup  as  follows: 

POP  P,3 • 

val 

*  acl  :  x 

POP  P,2 ' 

y 

:  ac2:  y 

POP  P,1  * 

X  « 

:  ac3:  value  to 

store 

POP  P,5 : 

pic 

;  ac5  pointer 

to  pic  record 

JRST  ©ptpixof fset (4) 

•  acb  return  addi  ess 

END: 

:  result  is  returned  in  acl 

PTPIX 


MOVE 

4.xtabeaseo  ) 

ADD 

4,  YTAEBASE(2) 

;  eleven!  number 

HLRZ 

2,4 

;  offset  (note  l) 

SKIPN 

i,HAPTABBASE(4) 

;  get  virtual  addr 

PUSHJ 

P , PAGEFAULT 

:  page  not  there 

DPB 

3,PTRTABDA3E(2) 

:  get  the  pixel 

;  byte  pointeis 

index  by  AC4 

JRST 

@6 

;  return 

Note  1 

For  images  which  have  a  power  of  2  pixels 
per  page,  the  xtab,  ytab  tables  can  be  constructed 
so  that  no  divide  is  required  to  compute  the  page 
number  and  offset  within  page.  This  is 
accomplished  by  modifying  the  access  tables  as 
follows : 


modified-tab[i]  __  tab 
+  tab 


DIV  pagsiz  §  page 
MOD  pagsiz  LSH  13:#  offset 


Both  JAIL  and  MAIKSAIL  packages  have  been 
implemented  for  access  to  variable  pixel  size, 
software  mapped  image  files.  The  following 
loop  was  used  to  determine  the  pixel  access 
times : 


FOR  Y ,  =  0  STEP  1  UNTIL  255  DO 
FOR  X:  *Y  STEP  1  UNTIL  255  DO 


BEGIN 


END 


INTEGER  V 1 , V2  ; 
GETPIX (P IC ,X , Y,VI) ; 
GETPIX(PIC,Y,X,V2)* 
PUTPIX (PIC,X,Y,V2) 
PUTPIX (PIC , Y,X,  VI )  * 


The  total  CPU  time  on  a  KL-10  for  a  256x256 
image  of  8  bits  per  pixel  was  4  seconds,  or  about  8 
microseconds  per  access  Inclining  the  loop 
overhead.  Note  that  the  GETPIX  and  PUTPIX 

functions  used  here  were  SAIL  in-line  macros  rather 
than  procedure  calls. 

The  following  loop  which  accesses  the 
elements  of  a  normal  2-d  SAIL  array  in  strictly 
storage  order  also  required  about  1  second  to 
execute,  or  about  8  microseconds  per  access: 

FOR  K_ 1  STEP  1  UNTIL  4  DO 

FOR  I_0  STEP  1  UNTIL  256  DO 

FOR  J_0  STEP  1  UNTIL  256  DO 
RESULT__A  [I ,  J]  ; 

One  concludes  from  these  examples  that  if 
the  pixel  access  functions  are  carefully 
implemented,  then  there  is  little  or  no  cost  in 
time  over  conventional  array  accesses. 


/ 


ye 


»  ' 


SOME  USES  OF  PYRAMIDS 
IN  IMAGE  PROCESSING  AND  SEGMENTATION 


Azriei  Rosenfeld 


Computer  Vision  Laboratory,  Computer  Science  Center 
University  of  Maryland 
College  Park,  MD  20742 


ABSTRACT 

This  paper  summarizes  recent  applications  of 
multi-resolution  ("pyramid")  image  representations 
in  image  analysis  and  processing,  including 

a)  Approximation  of  Gaussian-kernel  convolu¬ 
tion  operators  by  iterated  local  convolu¬ 
tions 

b)  Construction  of  overlapped  pyramids  to 
reduce  shift  sensitivity 

c)  Clustering  of  pyramid  nodes  into  trees 
representing  image  regions 

d)  Use  of  local  operations  in  a  pyramid  to 
detect  blob-like  and  streak-like  objects 
in  the  image 

e)  Use  cf  pyramids  to  define  sets  of  quadtree 
approximations  to  the  image  ("Q-images") 

f)  Image  sagmentation  by  analysis  of  the 
histograms  of  Q-images 

g)  Improved  edge  detection  based  on  associa¬ 
tions  between  edges  in  the  image  and  a 
corresponding  Q-image, 


INTRODUCTION 

Pyramid  structures,  that  is,  sequences  of 
arrays  of  exponentially  reduced  resolution  derived 
from  a  given  image,  have  been  used  by  a  number  of 
investigators  [e.g.,  1-6]  for  image  processing  and 
analysis.  The  simplest  pyramid  construction 
scheme  assumes  that  the  input  image  is  2n  by  2n 
and  reduces  resolution  by  a  factor  of  2  at  each’ 
step  using  2-by-2  block  averaging,  so  that  the 
pyramid  has  n+1  levels  of  sizes  2nx2n,  2n”1x2n_1 
. ..,2x2,lxl.  This  paper  describes  some  generali¬ 
zations  of  the  pyramid  construction  process,  and 
briefly  presents  some  applications  of  pyramid 
structures  to  image  processing  and  segmentation. 
Further  details  can  be  found  in  individual  tech¬ 
nical  reports. 


GAUSSIAN  CONVOLUTIONS 

In  general ,  pyramids  can  be  constructed  using 
weighted  averages  over  neighborhoods  of  any 
desired  size,  which  can  be  allowed  to  overlap. 

If  wc  require  the  weighting  function  to  satisfy 
certain  very  natural  constraints,  it  turns  out 
that  the  weight  patterns  after  a  few  iterations 
become  almost  exactly  Gaussian.  For  simplicity, 
we  first  illustrate  this  for  the  case  of  a  one- 
dimensional  array  f(x)  and  a  weighting  function  of 
odd  width  (w(i)  for  -m  &  1  &  m) . 

p  ra 

Let  80(n)  =  f(n)  and  g^(nr  )  -  Z  w(i)g£ 

p  T— 1  i— "*m 

(nr  +ir  )  for  l  ^  1;  thus  each  g p  is  a  weighted 
average  of  2m+l  g^^'s  spaced  r^“^  apart.  We 
adopt  the  following  simple  constraints  on  the  w's: 
m 

(1)  £  wU)  =  1 

i*»-m 

(2)  w (— i )  =  w ( i ) ,  1  &  i  £  m 

(3)  0  £  i  £  j  implies  0  £  w(j)  £  w(i) 

(4)  Z  w(k-fir)  »  —  ,  0  £  k  <  r 

i=-m  r 

Constraints  (2-3)  require  the  w’s  tu  have  a  sym¬ 
metric  central  peak,  and  constraint  (4)  insures 
that  each  f(n)  (except  near  the  array  border)  con¬ 
tributes  with  equal  weight  at  every  level  l.  Evi¬ 
dently,  each  g i  is  the  convolution  of  f  with  an 
equivalent  kernel11  h£,  obtained  by  expanding  the 
definition  of  g «  until  it  is  expressed  in  terms  of 
w's  and  f's. 

For  example,  let  m  =  2,  r  -  2,  and  w(0)  -  a, 
w(l)  -  b,  w(2)  =  c  (see  Figure  la).  Then  con¬ 
straints  (1-4)  yield  1/4  &  a  &  1/2,  b  =  1/4,  and 
c  *  1/4  -  a/2,  Figure  lb  shows  the  equivalent 
convolution  kernels  h  /  using  a  =  0.4  for  l  = 

1,2,3,  and  for  large  L;  note  that  the  shape  of  hn 
rapidly  approaches  a  ’'Gaussian"  (the  discrepancy^ 
from  a  Gaussian  is  in  fact  very  small).  The  ap¬ 
proximation  to  a  Gaussian  remains  quite  good  for  a 
in  the  range  0.3  to  0.45,  with  the  Gaussian  be¬ 
coming  more  sharply  peaked  as  a  increases(Fig. lc) . 

Similar  results  can  be  obtained  for  even  w  and 
for  non-integer  l*.  The  generalization  to  two 
dimensions  is  also  straightforward,  especially  if 
we  take  the  kernel  w  to  be  separable.  By  taking 
linear  combinations  of  Gaussian  convolutions,  one 
can  construct  operators  of  many  sizes  that  respond 


M3 


to  spots,  edges,  or  bars.  Gaussian  convolutions 
can  be  computed  in  the  pyramid  more  efficiently 
than  with  the  FFT,  typically  by  two  or  t^ree  orders 
of  magnitude.  For  the  details  on  all  of  these 
matters  see  [7] . 


OVERLAPPED  PYRAMIDS 

The  general  pyramid  construction  process 
described  above  allows  the  averaging  neighborhoods 
to  overlap— e.g.  ,  when  m  =  r  =  2,  level  1  is  con¬ 
structed  from  level  0  using  neighborhoods  of  width 
2m  -F  1  =  5  spaced  2  apart,  and  similarly  at  higher 
levels.  Note  that  in  spite  of  the  overlap,  the 
pyramid  still  tapers  exponentially. 

An  advantage  of  overlapped  pyramids,  as  com¬ 
pared  to  the  standard,  unoverlapped  block  averaging 
scheme,  is  that  when  local  feature  detectors  are 
applied  at  various  levels  of  a  nonoverlapped  pyra¬ 
mid,  their  ability  to  detect  large  features  in  the 
image  defends  strongly  on  the  positions  of  these 
features .  For  example,  suppose  we  want  to  detect 
bloblike  objects  of  diameter  ~4.  If  such  an  object 
is  located  in  a  position  whose  coordinates  are  mul¬ 
tiples  of  4,  it  wii 1  (approximately)  be  contained 
in  one  of  the  4x4  blocks  whose  averages  are  at 
level  2  of  the  pyramid,  and  the  neighboring  4x4 
blocks  will  b^  (approximately)  disjoint  from  it,  so 
that  a  local  spot  detector  applied  at  level  2  will 
respond  to  it.  On  the  other  hand,  if  the  coordi¬ 
nates  are  odd  multiples  of  2,  the  object  overlaps 
four  adjacent  4x4  blocks,  whose  averages  will 
respond  only  weakly  to  its  presence.  This  position 
dependence  is  reduced  if  we  use  an  overlapped  py¬ 
ramid  e.g.,  in  one  dimension,  if  the  values  at 
successive  layers  represent  averages  of  50%  over¬ 
lapped  intervals: 

Level  Intervals 

1  [1,2];  [2,3];  [3,4];  .... 

2  [Ml;  [3,6];  [5,8];  .... 

3  [1,8];  [5,12];  [9,16];  ... 

Here  a  spot  detector  would  be  based  on  adjacent 
nonoverlapping  blocks,  e.g.  [5,8]  vs  [1,4 j  and 
[9,12];  and  the  center  block  Las  stronger  response 
than  both  adjacent  blocks  even  if  the  object  is 
not  in  the  optimal  position.  Note  that  in  this 
simple  example  of  an  overlapped  pyramid,  level  1 
is  nearly  the  same  size  as  level  0,  but  the  expo¬ 
nential  shrinking  begins  at  level  2. 


REGION  EXTRACTION  BY  PYRAMID  NODE  CLUSTERING 

A  number  of  image  smoothing  algorithms  have 
been  developed  in  which  each  pixel  is  averaged  with 
a  subset  of  its  neighbors  chosen  sc  as  to  make  it 
likely  that  the  neighbors  in  the  subset  all  belong 
to  the  same  image  region  as  the  given  pixel;  e.g., 
we  can  use  the  neighbors  whose  gray  revels  most 
resemble  that  of  the  pixel,  or  we  can  examine  a 
set  of  asymmetric  neighborhoods  of  the  pixel  and 


choose  one  whose  average  gray  level  most  resembles 
that  of  the  pixel.  It  would  be  of  interest  to 
generalize  this  concept  to  a  pyramid  environment; 
this  might  involve  associating  nodes  at  a  given 
level  with  "neighboring"  nodes  at  the  next  higher 
level  that  appear  to  belong  to  the  same  region, 
and  so  on.  If  this  could  be  done,  the  averaging 
process  could  propagate  across  a  region  much  more 
quickly  (in  a  number  of  iterations  proportional  to 
the  log  of  the  region  diameter),  as  the  largest 
blocks  contained  in  the  region  become  associated 
with  the  smaller  blocks  near  the  region's  borders. 

The  following  is  a  simple- scheme  for  linking 
blocks  in  in  overlapped  pyramid  (hased  on  4x4 
blocks  with  50  overlap)  so  as  to  obtain  linked 
clusters  of  blocks  representing  two  or  three 
types  of  regions.  The  pyramid  is  initialized  by 
performing  the  successive  overlapped  4x4  averages. 
At  the  top  level,  which  we  take  to  be  2x2,  the  "re¬ 
sulting  four  averages  are  grouped  into  two  or 
three  classes,  depending  on  the  desired  number  of 
regions  and  on  their  expected  relative  sizes,  and 
each  group  is  given  a  singie  average  value.  Each 
block  B,  say  at  level  £,  contributes  to  the  ave¬ 
rages  of  several  (overlapped)  "father  blocks"  at 
level  l+l  four,  iri  our  experiments;  we  now  link 
block  B  to  that  father  block  (or  group,  if  B  is 
just^below  the  top  level)  whose  average  is  closest 
to  B's.  Next,  we  recompute  the  block  averages, 
with  each  block  at  level  £+1  using  only  the  ave¬ 
rages  of  the  blocks  at  level  f  linked  to  it  to 
compute  its  new  average.  Since  this  changes  the 
block  averages,  the  links  may  now  change,  so  we 
can  repeat  the  entire  process.  Typically,  the 
process  stabilizes  after  10  to  20  iterations;  at 
that  stage,  the  linked  sets  of  blocks  define  a 
good  decomposition  of  the  image  into  regions. 

Figure  2a  shows  an  example  (a  64x64  FLIR  image 
cf  a  tank)  in  which,  at  the  2x2  level,  three  of 
the  averages  were  grouped  together,  while  the 
fourth  one  defined  its  own  cloSs.  The  original 
pyramid  is  shown  at  the  lower  left,  while  the 
upper  nine  images  show  the  average  gray  level 
of  the  group  to  which  each  pixel  is  linked  at 
iterations  1,  2,  3,  4,  5,  6,  7,  14,  and  26.  Note 
that  at  iteration  1  the  tank  is  nearly  invisible, 
since  the  averages  at  the  2x2  level  are  initially 
nearly  the  same.  The  results  are  sensitive  to 
the  number  of  classes  used  at  the  2x2  level,  and 
to  the  sizes  of  these  classes;  Figure  2b  shows 
corresponding  results  when  the  four  averages  at 
the  2x2  were  grouped  into  two  classes  of  two 
each  (the  "tank"  class  is  too  large),  and  Figure 
2c  shows  results  when  three  classes  were  used  at 
the  2x2  level.  The  results  are  also  sensitive  to 
noisiness  in  the  original  image;  in  Figures  2a-c , 
the  input  image  was  blurred  before  building  the 
pyramid,  and  Figure  2d  shows  what  happens  when 
this  is  not  done  (in  the  case  where  two  groups  of 
two  each  are  used  at  the  2x2  level).  Finally, 

Figure  2e  shows  a  3-class  result  for  a  picture 
of  a  blood  cell;  it  yields  a  good  segmentation 
into  nucleus,  cell  body,  and  background.  Further 
experiments  with  this  approach  are  in  progress, 
and  will  be  documented  in  a  forthcoming  technical 
report . 


BLOB  AND  STREAK  DETECTION 


.  P>'r3p,ld  representations  provide  a  method  of 
using  simple  types  of  shape  information  in  the  seg¬ 
mentation  process,  rather  than  first  segmenting 
and  tnen  editing  the  segmentation  to  yield  regions 
of  the  desirea  shapes.  For  example,  if  „e  want  to 
extract  blob-like  (compact)  objects,  we  can  apply 
local  spot  detection  operators  at  each  level  cf 
the  pyramid  (if  the  size  of  the  desired  objects  is 
known,  we  can  rest-ict  this  to  a  few  levels),  and 
bias  the  segmentation  process  to  favor  object  ex¬ 
traction  in  those  parts  of  the  image  corresponding 
to  the  positions  of  detected  spots.  A  simple  way 
of  omg  this  [8]  is  to  define  a  local  threshold 
at  the  position  of  each  detected  spot,  e.g.  mid¬ 
way  between  the  average  gray  levels  of  the  spot's 
center  and  surround;  this  threshold  should  be  el¬ 
ective  for  extracting  the  object  at  that  position. 

An  analogous  approach  can  be  used  to  favor  the 
extraction  of  streak-like  objects,  i.e.,  linear 
eatures  of  arbitrary  thickness.  Such  objects  give 
rise  to  thin  lines  at  appropriate  levels  of  the 
pyramid.  Results  are  improved  if  a  local  line  en¬ 
hancement  process  is  applied  to  the  line  detection 
outputs  at  each  pyramid  level.  Figure  3  shows  the 
results  of  applying  this  process  to  portions  of 
aerial  photographs  showing  several  Maryland  air¬ 
ports.  These  results  were  obtained  with  a  non- 
overlapped  pyramid;  they  improve  when  an  over¬ 
lapped  pyramid  is  used.  A  further  possibility  for 
improvement  is  to  use  the  pyramid  line  detections 
to  initialize  linear  feature  tracking  processes  in 
the  image,  rather  than  simply  thresholding. 


QUADTREE  APPROXIMATIONS  TO  AN  IMAGE 

T>ie  ^presentation  of  a  binary  image 

(of  size  2  x2  )  is  constructed  by  repeated  subdi¬ 
vision  into  quadrants;  a  block  is  subdivided  un¬ 
less  it  consists  entirely  of  l's  or  of  0's.  More 
generally,  gi.en  a  criterion  for  the  homogeneity  of 
a  block,  we  can  use  a  similar  process  to  construct 
quadtree  approximations  tc  an  image,  by  subdividing 
blocks  into  quadrants  unless  they  are  homogeneous 
14J.  For  example,  suppose  that  we  call  a  block 
homogeneous  if  its  standard  deviation  is  less  than 
some  threshold,  ana  if  so,  we  represent  the  block 
by  its  mean.  When  we  construct  a  pyramid  repre¬ 
sentation  of  an  image  by  the  standard  2x2  averag¬ 
ing  process,  we  can  compute  the  standard  deviation 
as  well  as  the  mean  of  each  block;  for  any 
threshold  t,  we  thus  have  implicitly  defined  a 
quadtree  approximation  to  the  image,  where  a  block 
is  regarded  as  a  leaf  node  of  the  tree  if  its 
standard  deviation  does  not  exceed  t. 

Defining  homogeneity  in  terms  of  the  standard 
deviation  gives  us  a  zero-order  piecewise  least- 
squares  approximation  to  the  image,  since  the 
constant  which  best  approximates  an  image  block  in 
the  least  squares  sense  is  the  block's  mean,  and 
the  corresponding  least-squares  error  is  the  vari¬ 
ance.  More  generally,  we  can  define  piecewise 
least-squares  approximations  by  polynomials  of 


degree  k,  for  k  -  1,  2 . and  call  a  block 

homogenous  If  the  error  in  its  degree-k  least- 
squares  approximation  does  not  exceed  a  given 
threshold;  this  allows  us  to  define  higher-order 
quadtree  approximations. (Haralick  [10]  has  shown 
that  first-order  approximations  have  significant 
advantages  over  zero-order  approximations  for  pur- 
pocer  such  as  image  smoothing  and  edge  detection.) 


i - ciiipiuxima cions 

(of  any  given  degree)  to  an  image  can  be  computed 
Hierarchically.  If  „e  know  these  approximations, 
and  their  associated  errors,  on  the  subblocks  of  a 
lock,  we  can  compute  the  approximation  and  its 
error  for  the  entire  block  directly  from  this  in- 
ormation,  without  examining  the  originaj  image 
gray  levels.  1  his  allows  us  to  construct  higher- 
order  quadtree  approximations  "bottom-up",  start¬ 
ing  from  small  blocks  (e.g.,  for  which  the  appro¬ 
ximation  of  the  given  degree  is  exact),  and  com¬ 
puting  the  approximations  for  larger  blocks  until 
a  given  error  threshold  is  exceeded.  The  details 
of  this  procedure  can  be  found  in  [11]. 


arrKUAl NATIONS  AS  AIDS  TO  THRESHOLDING 

Suppose  we  have  constructed  a  zero-order  quad¬ 
tree  approximation  to  a  given  image;  this  appro- 
ximation,  which  we  call  a  "Q-image",  consists  of  a 
set  ot  blocks  (corresponding  to  the  leaves  of  the 
quadtree)  each  having  constant  gray  level  (the 
mean  gray  level  of  the  corresponding  image  block, 
which  had  a  hc^ow-threshold  standard  deviation, 
and  so  can  be  approximated  by  its  mean). 

If  the  image  consists  of  homogeneous  objects 
on  a  homogeneous  background,  the  Q-image  should 
contain  large  blocks  in  the  interiors  of  the  ob¬ 
jects  and  background,  as  well  as  smaller  blocks 
near  their  borders.  The  image  histogram  should 
consist  of  two  peaks  representing  the  object  and 
background  gray  level  populations,  but  these  peaks 
may  overlap  due  to  the  existence  of  intermediate 
gray  levels  in  the  border  zones.  In  the  histogram 
of  the  Q- image  (for  brevity:  the  Q-histogram) ,  the 
peaks  should  be  sharper,  since  averaging  reduces 
gray  level  variability;  this  expectation  is  con- 
lrmed  by  the  example  in  Figure  4.  Note  that  the 
Q  histogram  can  be  computed  rapidly  from  the  quad¬ 
tree  approximation,  by  counting  each  leaf  node  a 
number  of  times  equal  to  its  area. 


Further  improvement  in  the  Q-histogram  can  be 
obtained  by  suppressing  the  contributions  of  small 
blocks;  this  tends  to  eliminate  border  grav  levels 
while  retaining  levels  that  lie  interior  to  the 
objects  or  background.  Conversely,  if  we  consider 
the  small  blocks  only,  we  should  obtain  a  unimodal 
histogram  representing  border  gray  levels,  and  the 
mean  of  this  histogram  should  be  a  good  threshold 
-.or  separating  the  objects  from  their  background. 
Examples  of  the  results  of  suppressing  small 
blocks  or  using  only  small  blocks  are  £hown  in 
Figure  4  [12] . 


115 


More  generally,  if  there  are  several  peaks  in 
the  Q-histogram,  they  should  correspond  to  diffe¬ 
rent  types  of  homogeneous  regions  in  the  image; 
note  that  because  of  the  peak  sharpening  effect, 
such  peaks  will  be  easier  to  detect  in  the  Q- 
histogram  than  in  the  original  histogram.  When¬ 
ever  we  detect  such  a  peak,  we  can  find  the 
blocks  in  the  Q-image  that  give  rise  to  it,  and 
histogram  the  image  gray  levels  in  the  vicinity 
of  these  blocks;  this  allows  us  to  define  a  local 
threshold  suitable  for  extracting  the  regions  re¬ 
presented  by  the  peak.  If  desired,  the  threshold 
can  be  adjusted  to  yield  a  good  matcn  between  the 
borders  of  the  extracted  regions  and  the  edges  in 
the  image,  as  in  the  5UPERSLICE  algorithm  (but 
without  the  need  for  connected  component  analysis). 
Some  examples  of  images  segmented  in  this  way  are 
shown  in  Figure  5  [13]. 


QUADTREE  APPROXIMATIONS  AS  AIDS  TO  EDGE  DETECTION 

If  we  apply  an  edge  operator  to  a  Q-image,  we 
obtain  edges  that  are  generally  stronger  than 
those  in  the  original  image  (since  the  regions  on 
the  two  sides  of  the  edge  have  reduced  gray  level 
variability  because  of  the  averaging),  but  that  are 
displaced  from  their  proper  positions,  since  they 
are  constrained  to  lie  on  the  cracks  between 
blocks.  (It  should  be  pointed  out  that  edge  ope¬ 
rators  can  be  applied  directly  to  the  quadtree, 
using  neighbor-finding  techniques  to  determine  the 
gray  level  differences  between  adjacent  pairs  of 
blocks;  this  should  be  much  less  costly  than  con¬ 
structing  a  full-resolution  Q-image  and  applying 
edge  operators  at  every  pixel.)  If  we  can  asso¬ 
ciate  these  Q-edges  with  edges  in  the  original 
image,  it  should  be  possible  to  obtain  edge  maps 
representing  significant  edges  (as  obtained  from 
the  Q-image)  in  accurate  positions  (as  obtained 
from  the  original  image). 

Several  methods  of  associating  edges  with  Q- 
edges  are  investigated  in  [14];  the  details  will 
not  be  giver,  here.  Figure  6  shows  an  example  us¬ 
ing  an  iterative  enhancement  process,  based  on  the 
Q-edges,  to  selectively  enhance  edges  on  the  ori¬ 
ginal  image,  with  very  good  results. 


CONCLUDING  REMARKS 

The  results  in  this  paper  illustrate  a  few  of 
the  ways  in  which  pyramid  representations  of 
images  can  be  useful  in  image  processing  and  ana¬ 
lysis.  Further  work  in  this  area  is  in  active 
progress. 


REFERENCES 

1.  L.  Uhr ,  Layered  "recognition  cone”  networks 
that  preprocess,  classify,  and  describe, 

I EEETC-21 ,  1972,  758-768. 

2.  S.  L.  Tanimoto  and  T.  Pavlidis,  A  hierarchical 
data  structure  for  picture  processing,  CGIP  4 
1975,  104-119. 

3.  S.  L.  Tanimoto.  Pic? oral  feature  distortion  in 
a  pyramid,  ibid.  5,  1976,  333-352. 

4.  r.  Pavlidis,  S  t  rue  turn  I  Pattern  Recognition, 
Springer,  New  York,  1977. 

5.  E.  M.  Riseman  and  M.  A.  Arbib,  Computational 
techniques  in  the  visual  segmentation  of 
static  scenes,  CGIP  6,  1977,  221-276. 

6.  A.  P. .  Hanson  and  E.  M.  Riseman,  Segmentation 
of  natural  scenes,  in  A.  R.  Hanson  and  E.  M. 
Riseman,  eds.,  Computer  Vision  Systems, 
Academic  Press,  New  York,  19  78~.~ 

7.  P.  J.  Burt,  Fast,  hierarchical  correlations 
with  C»aus  s  i  an- 1  ike  kernels,  TR-860  ,  Computer 
Vision  Laboratory,  University  of  Maryland, 
College  Park,  MD,  January  1980. 

8.  M.  Shneier,  Using  pyramids  to  define  local 
thresholds  for  blob  detection,  TR-808,  Compu¬ 
ter  Vision  Laboratory,  University  of  Maryland, 
College  Park,  MD,  September  1979;  also  in 
Proc.  Image  Understanding  Workshop,  November 
1979,  31-35. 

9.  M.  Shneier,  Extracting  linear  features  from 
images  using  pyramids,  TR-855,  Computer  Vision 
Laboratory,  University  of  Maryland,  College 
Park,  MD,  January  1980. 

10.  R.  M.  Haralick,  Edge  and  region  analysis  for 
digital  image  data,  CGIP  12,  1980,  60-73. 

11.  P.  J.  Burt,  Hierarchically  derived  piecewise 
polynomial  approximations  to  waveforms  and 
images,  TR-838,  Computer  Vision  Laboratory, 
University  of  Maryland,  College  Park,  MD, 
November  1979. 

12.  A,  Y.  Wu ,  T.  H.  Hong,  and  A.  Rosenfeld, 
Threshold  selection  using  quadtrees,  TR-886, 
Computer  Vision  Laboratory,  University  of 
Maryland,  College  Park,  MD,  March  1980. 

13.  S.  Ranade,  A.  Rosenfeld,  and  J.  M.  S.  Prewitt, 
Use  of  quadtrees  for  image  segmentation, 

TR-878,  Computer  Vision  Laboratory,  University 
of  Maryland,  College  Park,  MD,  February  1980. 

14.  S.  Ranade,  Use  of  quadtrees  for  edge  enhance¬ 
ment,  TR-862,  Computer  Vision  Laboratory, 
University  of  Maryland,  College  Park,  MD, 
February  1980. 


Spatial  Position 


x 


Figure  lb.  Resulting  equivalent  convolution  kernels  at  successive  levels 


? 


-.r^v . 


Region  extraction  by  pyramid  node 

clustering . 

a)  Results  for  a  FLIR  image  of  a  tank 
(see  text)  when  the  averages  are 
grouped  (3  ,1)  at  the  2x2  level  . 

b)  Results  for  (2,2)  grouping 

c)  Results  for  (2,1,1)  grouping 

d)  Results  for  (2,2)  grouping  when  the 
input  image  is  not  blurred 

e)  Results  for  a  blood  cell  image 
using  (2,1,1)  grouping 


Figure  2 


Figure  3.  Pyramid  streak  extraction:  Results  for  three  airfield  images 


\H  oshald  =  34.B 


Figure  4. 


Q-images  as  aids  in  thresholding.  The  standard  deviation  tolerances  in  the  two  examples 
are  2.6  and  1.6,  respectively.  Counterclockwise  from  left. 

Original  image  and  histogram 
0-image  and  histogram 

q- image  with  small  leaves  deleted,  and  histogram 
Histogram  of  small  leaves  only  ..... 

Result  of  thresholding  at  mean  of  small-leaf  histogram 


.7. 


(a) 


Figure  5.  Segmentation  with  the  aid  of  a  Q- image. 

a)  Result  of  applying  Prewitt  edge  operator  to 
a  tank  image 

b)  Cleaned  edge  map  (see  Figure  6) 

c)  Blocks  belonging  to  the  tank  peak  on  the 
Q-image 

d-f;  Regions  extracted  by  local  thresholding  in 
the  vicinity  of  these  blocks,  for  three 
choices  of  the  threshold. 


Figure  6.  (a)  Image  of  a  part  of  Frederick  Airpor  L 

(b)  Result  of  non-maximun  suppression 
applied  to  the  Prewitt  edge  detector 
for  6(a)  thresholded  ct  0. 

(c)  to  (g)  Results  of  successive  itera¬ 
tions  of  the  enhancement  process 
thresholded  at  0. 


SOLVING  FOR  THE  PARAMETERS  ol  OBJECT  MODELS 
FROM  IMAGE  DESCRIPTIONS 


David  G.  Lowe 


Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 


Abstract 

A  solution  is  given  to  the  problem  of  calculating  the 
parameter  values  needed  to  bring  the  projection  of  a  three- 
dimensional  object  model  into  correspondence  with  an  image. 
The  predictive  components  of  the  ACRONYM  system  form 
tentative  matches  between  elements  of  the  model  and  ele¬ 
ments  of  the  image  description.  These  initial  matches 
are  used  to  determine  values  for  projection  and  model 
parameters  as  a  step  towards  making  more  detailed  predic¬ 
tions.  The  solution  uses  Newton-Raphson  convergence  in 
conjunction  with  a  modified  camera  transform  which  is  ex¬ 
pressed  in  terms  of  image-centered  parameters.  The  solution 
is  able  to  use  lines  in  an  image  even  when  we  lack  knowledge 
of  their  terminations.  The  method  can  also  be  used  to  deter¬ 
mine  the  positions  of  articulated  parts  of  models  and  can 
handle  constraints  on  the  object  position 


1.  Introduction 

Any  recognition  task  requires  us  to  first  have  a  repre¬ 
sentation  for  the  objects  which  are  to  be  recognized.  For 
the  visual  recognition  of  three-dimensional  objects,  the  most 
obvious  and  natural  representation  is  probably  the  use  of 
three-dimensional  models  which  give  a  full  specification  of 
an  object’s  shape  and  appearance  in  a  viewpoint  independ¬ 
ent  manner.  The  field  of  computer  graphics  has  already  had 
a  considerable  amount  of  experience  in  working  with  such 
models,  but  within  the  field  of  computer  vision  models  cf 
this  sort  have  rarely  been  used.  One  difficulty  encountered  in 
using  these  models  for  computer  vision  has  been  the  problem 
of  calculating  the  position  and  orientation  of  the  model  which 
will  bring  its  projection  into  correspondence  with  the  image. 
This  paper  describes  a  solution  to  this  problem  which  makes 
use  of  only  a  few  tentative  initial  matches  between  the  model 
and  image.  In  addition,  the  three-dimensional  model  need 
not  be  fully  specified  and  may  contain  articulated  parts. 
We  show  how  the  values  of  these  model  parameters  can  be 


derived  from  the  image  at  the  same  time  that  we  calculate 
the  object’s  projection  parameters.  We  refer  to  these  tw'o 
issues  as  the  problem  of  back-projection,  since  we  must  infer 
the  projection  and  model  parameters  from  the  image,  rather 
than  the  other  way  around.  These  new1  echniques  should 
make  it  much  easier  to  use  three-dimensional  models  in  com¬ 
puter  vision  applications  and  derive  the  many  other  benefits 
of  their  use. 

Obviously,  the  techniques  of  back-projection  are  not 
sufficient  in  themselves  to  implement  a  complete  vision  sys¬ 
tem  The  research  described  here  is  just  one  part  of  the 
ACRONYM  research  project  at  Stanford  (2,  3,  4],  which 
brings  together  the  many  other  components  required  in 
a  functioning  visual  recognition  system  ACRONYM  in¬ 
cludes  both  interpretive  (bottom-up)  and  pred.ctive  (top- 
down)  components.  The  following  discussion  outlines  three 
major  problems  which  have  been  inhibiting  the  use  of  three- 
dimensional  models  in  computer  vision,  and  describes  the 
techniques  used  within  ACRONYM  to  o  rcomc  them: 

(1)  One  of  the  most  difficult  problems  in  the  recogni¬ 
tion  process  is  in  getting  started.  It  is  hard  to  see  how  to 
make  any  initial  use  of  a  viewpoint  independent  model  if  we 
have  no  prior  information  about  its  orientation  or  locatior  in 
the  image.  This  problem  is  handled  within  ACRONYM  by  a 
general  geometric  reasoning  system  for  deriving  sets  of  quisi- 
in variant  features  from  the  model.  These  quasi-invariants 
are  features  which  are  observable  ove:  a  wide  range  of  orien¬ 
tations  and  viewing  conditions  (for  example,  parallel  lines  in 
the  model  will  appear  as  parallel  lines  in  the  image  under 
most  viewing  conditions,  and  there  will  be  restrictions  on 
their  relative  positions).  By  creating  several  sets  of  quasi- 
invariants  for  an  object  and  searching  the  image  for  instances 
of  them,  it  is  possible  to  set  up  hypothesized  matches  be¬ 
tween  a  few  image  features  and  parts  of  the  model  to  provide 
a  starting  point  for  the  techniques  described  in  this  paper. 
The  quasi-invariants  are  used  only  to  generate  hypothesized 
matches,  and  not  to  evaluate  the  correctness  of  £  match, 
bo  it  is  quite  reasonable  for  this  first  stage  to  return  many 


122 


incorrect  matches  for  each  correct  one. 

(2)  A  second  problem  with  using  three-dimensional 
models  has  been  the  difficulty  of  mathematically  manipulat¬ 
ing  them  in  order  to  compare  them  to  features  in  the  image. 
It  is  largely  with  this  problem  that  this  paper  is  concerned 
The  usual  techniques  of  computer  graphics  are  unintui¬ 
tive  and  inappropriate  when  applied  to  the  back-projection 
problem,  and  here  we  develop  new  mathematical  toolG  for 
deriving  the  projection  parameters  of  the  model  present  in  a 
particular  image. 

(3)  Another  important  objection  to  most  computer 
graphics  modelling  systems  is  that  the  models  seem  to  be 
too  tightly  specified.  We  often  wish  to  represent  looser  con¬ 
straints  than  giving  the  exact  three-dimensional  coordinates 
of  all  parts  of  the  model.  This  problem  is  partially  solved 
in  ACRONYM  by  allowing  almost  all  aspects  of  the  model 
to  be  parameterized  in  terms  of  variables,  and  an  important 
part  of  the  back-projection  solution  given  here  is  the  ability 
to  solve  for  the  values  of  these  variables  in  a  specific  image. 
ACRONYM  also  uses  a  system  of  tree-structured  constraints 
which  a*  lows  the  specification  of  generic  classes  of  objects, 
with  the  further  specification  of  specific  instances  in  terms 
of  the  original  generic  model  [2]. 

Computer  vision  researchers  have  rightly  been  wary  of 
investing  the  large  amount  of  time  and  effort  necessary  to 
create  a  full  geometric  modelling  system  when  it  has  not 
been  easily  apparent  how  to  use  these  models.  We  hope 
that  the  techniques  given  in  this  paper  will  show  that  such 
models  can  be  used  in  useful  ways.  For  many  recognition 
tasks,  there  i5  no  clear  alternative  to  the  use  of  viewpoint- 
independent  models.  Even  when  an  alternative  does  exist,  an 
important  advantage  in  the  use  of  three-dimensional  models 
is  that  they  allow  us  to  do  the  matching  in  the  domain  of 
volumes  -ather  than  the  domain  of  images,  which  gives  the 
matching  process  a  much  more  appropriate  basi6  on  which 
to  make  comparisons  and  measure  eiroi:  In  particular,  the 
ACRONYM  bystem  is  able  to  work  effectively  within  this 
domain  because  all  our  models  are  themselves  built  up  from 
primitive  volume  elements  in  the  form  of  generalized  cones 
[  1],  rather  than  using  the  more  common  surface  descriptions 
cf  objects.  Generalized  cones  also  provide  a  particularly  ap¬ 
propriate  parameterization  for  many  objects.  However,  the 
techniques  outlined  in  this  paper  are  applicable  to  almost 
any  three-dimensional  representation. 

2.  Forward  Projection 

Before  the  back-projection  t  jehniques  for  deducing  the 
projection  parameters  can  be  presented,  it  is  first  necessary 
to  understand  the  methods  used  for  projection  in  the  forward 
direction.  The  projection  method  presented  here  is  similar 


to  those  which  are  commonly  used  for  computer  graphics  |6]. 
In  essence,  the  technique  is  to  specify  the  type  of  camera 
being  used  and  its  location  and  orientation  with  respect  to 
the  three-dimensional  model.  These  parameter  are  used 
in  a  coordinate  transform  to  compute  two-dimensional  coor¬ 
dinates  for  points  in  the  image  from  three-dimensional  model 
coordinates. 

The  following  transform  models  a  standard  camera  with 
the  lens  pointing  in  a  direction  normal  to  the  center  of  the 
image  piane.  The  variable  /  specifies  the  distance  of  the 
image  plane  from  the  projection  point,  and  usually  does  not 
need  to  be  determined  from  the  image  when  we  are  using 
a  known  camera  (for  convenience,  we  can  let  /  represen 
the  ratio  of  image  distance  to  the  width  of  the  image  plan  . 
which  means  that  image  coordinates  vary  from  0  to  1  acres, 
the  image).  We  must  also  specify  a  vector  T  giving  the  loca¬ 
tion  of  the  camera  lens  in  terms  of  world  coordinates,  and  a 
rotation  matrix  R  which  depends  on  the  camera  o  dentation 
and  maps  points  in  world  coordinates  into  points  in  a  coor¬ 
dinate  system  with  x  and  y  axes  parallel  to  the  x  and  y  axes 
of  the  camera  film  plane.  Then  the  transform 

(s.l/.*)  =  R(P~T) 


first  transforms  the  point  p  in  world  coordinates  into  the 
point  (xtytz)  in  camera-based  cooidinates,  and  then  creates 
the  perspective  projection  of  this  point  onto  the  image  piane, 
with  image  coordinates  (x‘,y').  Since  this  transform  maps 
lines  in  the  model  into  lines  in  the  image,  it  is  only  necessary 
to  transform  the  endpoints  of  a  line  in  order  to  create  its 
complete  projection. 

The  most  difficult  aspect  of  the  transformation  is  repre¬ 
senting  and  working  with  the  rotation  R.  Most  work  in  com¬ 
puter  graphics  chooses  to  represent  rotations  with  three-by- 
three  matrices,  but  this  representation  is  not  very  good  for 
our  purposes  since  it  uses  nine  variables  to  represent  some¬ 
thing  which  has  only  three  underlying  parameters.  Another 
possibility  is  to  represent  the  rotation  by  giving  its  axis  of 
rotation  plus  the  angle  of  rotation  about  this  axis.  In  fact, 
we  can  let  the  magnitude  of  the  axis  vector  represent  the 
magnitude  of  the  rotation,  and  we  have  thus  reduced  the 
rotation  to  the  minimal  three  parameters.  However,  the 
axis-angle  representation  requires  a  good  deal  of  computa¬ 
tion  when  we  actually  wish  to  rotate  a  point,  and  also  makes 
it  difficult  to  compos?,  rotations.  Quaternions  [7]  are  a  rep¬ 
resentation  which  combine  the  advantages  of  these  other 
methods  and  have  proved  to  be  the  most  useful  for  our  work. 
They  use  four  variables  to  represent  a  rotation  in  such  a  way 


123 


that  composition,  normalization,  rotation,  and  creation  of 
a  rotation  about  an  arbitrary  axis  are  all  computationally 
efficient.  However,  the  back-projection  solution  we  are  about 
to  present  is  independent  of  any  particular  representation  for 
rotations. 


3.  Back-projection  using  the  Newton-Raphson  method. 

There  are  seven  underlying  parameters  in  the  camera 
transform  presented  above — three  parameters  give  the  cam¬ 
era  position  T,  three  more  are  sufficient  to  specify  the  rota¬ 
tion  R}  and  /  specifies  a  property  of  the  camera  itself. 
Therefore,  the  back-projection  problem  is  simply  to  ca’ 
culate  the  values  for  these  parameters  which  produce  the 
best  fit  between  an  image  and  the  projection  of  a  model. 
However,  because  of  problems  in  calculating  a  rotation  in 
terms  of  its  three  underlying  parameters,  there  appears  to 
be  no  s t r aight foi ward  symbolic  solution  to  the  problem,  and 
we  are  forced  to  seek  an  iterative  solution.  The  method  we 
have  chosen  is  Newton-Raphson  convergence,  which  has  a 
fast  quadratic  convergence  and  can  be  cleanly  applied  to  this 
problem.  The  Newton-Raphson  method  works  by  measuring 
each  error  in  the  current  approximation  and  calculating  the 
derivative  of  each  of  these  error  measurements  with  respect 
to  each  of  the  underlying  parameters.  We  then  create  a  sys¬ 
tem  of  linear  equations  which  expresses  each  error  as  the 
sums  of  the  correction  to  be  made  in  each  parameter  times 
the  derivative  of  the  error  with  respect  to  that  parameter.  By 
Bolving  this  system  of  linear  equations,  we  determine  correc¬ 
tions  to  be  added  to  the  parameters  v/hicii  will  correct  for  the 
originally  measured  errors.  This  technique  works  beH  when 
the  derivatives  are  all  fairly  independent  of  one  another,  and 
are  smooth  enough  over  the  error  range  for  good  convergence. 

Unfortunately,  the  specification  of  the  camera  trans¬ 
form  given  in  the  previous  section  does  not  have  simple 
derivatives  of  x 9  and  y(  with  respect  to  the  camera  trans¬ 
form  parameters.  Once  again,  this  k  a  result  of  the  fact 
that  it  is  difficult  to  represent  a  rotation  in  terms  of  its  three 
underlying  parameters.  We  have  solved  this  difficulty  by 
reparameterizing  the  camera  transform  to  express  it  in  terms 
of  parameters  that  are  related  to  the  camera  coordinate  sys¬ 
tem  rather  than  world  coordinates,  This  new  transform  must 
be  chosen  carefully  from  among  the  many  possibilities  in  or¬ 
der  to  keep  the  parameters  as  independent  as  possible  from 
each  other  and  to  keep  the  derivatives  simple.  As  before,  our 
new  transform  specifies  how  a  three-dimensional  point  p  is 
to  be  mapped  onto  a  point  in  the  image  (z^y1): 

(x,ytz)  =  R(p) 


ix  ,  V 


V 2  +  A 


ft, 


=  (/sc  +  ft-,  fyc  -f  Dy)  where  c  =  - 

^  -r  ft 

Here  the  variables  R  and  /  remain  the  same  as  in  the  previous 
transform,  but  the  vector  T  has  been  replaced  by  (ft, ft, ft), 
where  the  two  transforms  are  equivalent  when 


T=R-i(  +  A’!*  +  ft) 

V  /  '  “  / 

The  new  parameterization  is  much  better  for  our  pur¬ 
poses,  since  ft  and  ft  simply  specify  the  location  of  the 
object  on  the  image  plane  and  ft  specifies  the  distance  of  the 
object  from  the  camera;  compare  this  with  the  very  indirect 
specification  of  these  same  camera-related  variables  given  by 
T.  However,  we  have  still  sol.-ed  only  half  the  problem,  since 
the  three  parameters  underlying  the  rotation  matrix  are  still 
difficult  10  express  in  a  form  closely  related  to  the  image. 
Our  solution  to  this  second  problem  was  not  to  try  to  some¬ 
how  express  R  in  terms  of  irnage-cer.tered  parameters,  but 
to  take  the  initial  specification  of  R  as  given  and  add  to  it 
incremental  rotations  <px,  d>y  and  about  the  x,  y  and  z 
ai:es  of  the  camera  coordinate  system.  It  is  easy  to  compose 
rotations  (especially  when  the  quaternion  representation  of 
rotations  is  used  as  mentioned  above),  and  the  incremental 
rotations  are  fairly  independent  of  one  another  if  they  are 
small.  The  Newton-Raphson  method  is  now  carried  out  by 
correcting  errors  in  x1  and  y9  by  calculating  the  optimum 
correction  rotations  x ,  <f)y  and  to  be  made  about  the 
image  axes.  Instead  of  adding  these  corrections  to  underly¬ 
ing  parameters  of  R  we  create  rotations  of  the  given  mag¬ 
nitudes  about  their  respective  coordinate  axes  and  compose 
these  new  rotations  with  R. 

One  big  advantage  of  using  the  <^’s  as  our  convergence 
parameters  is  that  the  derivatives  of  z,  y)  and  2  (and  there¬ 
fore  of  x 9  and  y9)  with  reEpect  to  them  can  be  expressed 
in  a  strikingly  simple  form.  For  example,  the  derivative  of 
1  at  a  point  (z,y)  with  respect  to  a  counter-clockwise  rota¬ 
tion  of  $ 2  about  the  2  axis  is  simply  — y,  since  (z,y)  = 
(dcos  <f> 3 ,  d  sin  <j>?)  and  therefore  dxjdfa  =  —  d  sin  <j)%  =  —  y. 
The  following  table  gives  these  derivatives  for  all  combina¬ 
tions  of  values: 


X 

v 

z 

0 

—2 

y 

h 

2 

0 

—  z 

<t>  2 

— V 

X 

0 

Partial  derivatives  of  x,  y  and  z  with  respect 
to  counterclockwise  rotations  </>'t  (in  radiant:) 
about  the  coordinate  axes. 


124 


Given  these  derivatives  it  is  straightforward  to  ac¬ 
complish  our  original  objective  of  calculating  the  partial 
derivatives  of  x1  and  y'  with  respect  to  each  of  the  original 
camera  parameters.  For  example,  our  transform  tells  us  that: 


so 

and 

dx '  f 

d(j)y  Z  -|~  Di  dfiy  [z  -|-  Di )2  d<j>y 

=  Jcz  +  f  c2x2  —  fc[z  -f  cx2) 

and 

dx '  __  /  dx  __ 

d(j>2  z  4“  Di  d(j>%  ^ 

All  the  other  derivatives  can  be  calculated  in  a  similar  way, 
and  this  next  table  gives  the  derivatives  of  x '  and  yl  with 
respect  to  each  of  the  seven  parameters  of  our  camera  model: 


z  *+-  Di 
dx' 


i  ^ 


ODx 

dx 


=  1 


fz 


dz 


x' 

y' 

A 

1 

0 

A 

0 

1 

A 

— S  c2x 

- fc3V 

—fc2xy 

—}c{z  +  cy3) 

fc{z  +  cx3) 

fc3xy 

<f)j 

~fcy 

f  cz 

f 

CX 

cy 

Fartial  derivatives  of  ?!  ard  y '  wUh  respect 
to  each  of  the  camera  transform  parameters, 


Given  these  partial  derivatives  of  x 1  and  y',  it  is  easy 
to  perform  the  convergence.  For  each  point  in  the  model 
which  should  match  against  some  corresponding  point  in  the 
image,  we  first  calculate  the  camera  transform  of  the  model 
point  and  measure  the  error  in  its  x  component  when  com¬ 
pared  to  the  given  image  point.  We  then  create  an  equation 
which  expresses  this  error  E  as  the  sum  of  the  products  of 
its  partial  derivatives  times  the  error  correction  values: 


+ + HAA + wS** 


Using  the  same  point  we  create  a  similar  equation  for  its 
y  component,  so  for  each  point  correspondence  we  derive  two 
equations.  From  three  point  correspondences  we  can  derive 


six  equations  and  produce  a  complete  linear  system  which 
can  be  solved  for  all  six  camera  model  corrections  (we  are 
assuming  that  the  camera  parameter  /  is  either  given,  or 
can  be  approximated  by  a  large  value).  After  adding  these 
corrections  to  our  starting  camera  model,  we  have  completed 
one  iteration  of  the  convergence.  After  each  iteration  the  A 
terms  should  shrink  by  about  one  order  of  magnitude,  and 
no  more  than  a  few  iterations  should  be  needed  even  for  high 
accuracy. 

In  most  applications  of  this  method  we  will  be  given 
more  correspondences  between  model  and  image  than  are 
strictly  necessary,  and  wt  will  want  to  perform  some  kind 
of  best  fit.  In  this  case  the  Gauss  least-squares  method  can 
easily  be  applied.  The  matrix  equation  given  above  can  be 
expressed  as 


where  [A]  is  the  derivative  matrix,  (A)  is  the  vector  of  un¬ 
known  corrections,  and  \E\  is  the  vector  of  error  terms. 
When  this  system  is  overdetermined,  we  can  perform  a  least- 
squares  fit  of  the  errors  simply  by  solving 

[A\t\A)\A]  =  \A\t[E\ 

where  [Aj^Aj  is  square  and  has  the  correct  dimensions  for 
the  vector  [A]. 

The  convergence  properties  of  this  solution  are  such 
that  there  should  be  few  problems  in  picking  the  initial 
parameter  values  from  which  to  converge.  As  long  as  the 
rotation  errors  <fizi  $y  and  <j>2  are  not  greater  than  about 
90  degrees,  almost  any  values  can  be  chosen  for  fhe  other 
parameters.  Usually,  the  source  of  the  hypothesized  matches 
carries  a  rough  implication  of  the  orientation  of  the  object — 
for  example,  the  quasi-invariants  mentioned  in  the  introduc¬ 
tion  are  only  applicable  over  a  certain  range  of  object  orien¬ 
tations,  so  we  have  some  idea  of  the  object’s  orientation  just 
by  knowing  which  set  of  quasi-invariants  were  used. 

The  situation  may  arise  in  which  the  values  of  some 
parameters  are  not  deducible  from  the  image  and  our  solu¬ 
tion  is  under-constrained.  These  situations  can  be  handled 
bjr  just  not  including  the  indeterminate  parameters  in  our 
equations,  but  we  have  not  yet  developed  a  general  method 
for  deciding  which  parameters  these  are.  We  expect  a  solu¬ 
tion  to  this  problem  shortly. 

4.  Using  incompletely  specified  models 

So  far  we  have  been  describing  the  process  of  solving 
for  the  parameters  which  determine  an  object’s  position  and 
orientation  with  respect  to  the  external  world.  An  important 
extension  to  this  method  is  the  ability  to  use  models  which 


125 


are  parameterized  internally,  and  have  variable  parti  or  ar¬ 
ticulations  between  parts.  We  can  determine  the  values  of 
these  model  parameters  in  the  same  way  'mat  we  determine 
the  correct  projection  parameters  The  only  requirement  is 
that  we  be  able  to  calculate  the  directional  derivatives  of 
points  in  the  model  with  respect  to  the  new  parameters. 
For  the  common  types  of  model  parameterization,  such  as 
variable  lengths  or  variable  rotations  about  some  axis,  these 
derivatives  arc  easily  determined.  From  these  it  is  easy  to 
calculate  the  derivatives  of  projected  image  pomts  and  use 
them  in  the  same  way  for  convergence  as  before.  vVe  will 
now  require  more  given  correspondences  between  image  and 
model  in  order  to  have  a  fully-determined  system,  but  since 
each  additional  correspondence  between  an  image  point  and 
a  model  point  allows  us  to  solve  for  two  more  unknown  vari¬ 
ables  there  should  be  little  difficulty  in  meeting  this  require¬ 
ment. 

The  power  of  this  method  can  be  best  illustrated  by 
giving  an  example.  Assume  that  we  want  to  recognize  pie¬ 


ces  of  different  types  of  airplanes,  and  we  do  not  know  in 
advance  which  type  of  airplane  will  be  in  a  certain  image 
In  this  case  our  airplane  model  will  have  to  be  quite  general 
and  will  not  be  able  to  give  precise  measurements  for  various 
engths  or  such  things  as  the  angle  between  the  wings  and  the 
fuselage.  However,  certain  important  constraints  are  known 
such  as  the  fact  that  the  airplane  will  be  symmetrical  about 
0  useldge.  i  his  symmetry  will  be  represented  to  the  con- 
vergence  algorithm  by  the  fact  that  the  model  parameters 
referring  to  the  right  wing  will  be  the  same  as  those  referring 
to  the  left  wmg,  and  any  changes  in  these  parameters  refer  to 
both  wings.  7  r,e  convergence  algorithm  will  then  determine 
a  camera  transform  and  wing-fuselage  angle  which  together 
produce  the  closest  fit  of  model  to  image,  as  in  the  example 
Clow.  Note  that  there  may  well  be  insufficient  informa¬ 
tion  to  determine  either  the  camera  transform  or  the  wing- 
■  uselage  angle  independently,  so  the  ability  to  solve  for  both 
simultaneously  using  knowledge  of  the  plane's  symmetry  is 
crucial  to  determining  a  solution. 


The  flfit  four  iteration]  for  the 


.  ltoph  <omt  lmw  ,.riii 


Another  typo  of  constraint  arises  when  we  have  some 
prior  knowledge  about  the  location  of  an  object.  For  ex¬ 
ample,  we  may  know  the  position  of  the  ground  plane  rela¬ 
tive  to  the  camera  and  we  car  constrain  the  airplane  to 
be  positioned  on  the  ground.  In  this  case  the  airplane  has 
only  three  degrees  of  freedom  in  its  position  (its  x  and  y 
location  on  the  ground  and  its  orientation  about  the  verti¬ 
cal  axis).  In  this  situation  v/e  do  not  need  to  solve  for  the 
full  camera  model,  since  th  s  has  .already  been  determined 
relative  to  the  ground  planr..  Instead,  we  can  just  solve  for 
the  parameters  giving  the  position  of  the  airplane  relative 
to  the  ground  u"ing  the  techniques  given  ztove.  This  sug¬ 
gests  that  a  more  uniform  description  of  the  back-projection 
algorithm  would  be  to  treat  the  parameters  which  we  have 
been  calling  “projection  parameters”  as  just  other  kinds  of 
model  parameters  which  give  the  position  of  the  entire  ob¬ 
ject  relative  to  camera  space.  This  is  not  an  entirely  trivial 
change,  as  it  will  require  a  special  mechanism  to  represent 
and  work  with  the  arbitrary  rotation  which  can  describe  an 
object’s  orientation  with  respect  to  the  camera.  However, 
the  change  would  make  it  conceptually  easier  to  work  with 
various  situations  in  which  we  have  multiple  objects  and  only 
one  camera.  This  is  an  area  for  further  research. 

One  more  extension  that  should  be  added  to  the  basic 
algorithm  is  to  allow  the  user  to  specify  a  legal  range  for 
each  parameter.  If  the  algorithm  converges  to  a  value 
for  a  parameter  which  is  outside  of  its  legal  range,  then 
this  parameter  can  be  set  to  its  closest  legal  value  and  we 
can  reconverge  using  the  remaining  parameters.  This  will 
produce  a  more  accurate  result  when  errors  cause  the  least- 
squares  fit  to  move  a  parameter  to  some  value  we  know  it 
cannot  have. 

5.  Making  use  of  line  data 

Another  problem  with  the  simple  back-projection  al¬ 
gorithm  is  that  the  initial  image-model  correspondences  we 
derive  are  usually  in  *erms  of  lines  rather  than  points.  This 
is  because  low-level  vision  routines  are  relatively  good  at 
finding  the  location  of  lines  but  are  much  less  certain  about 
exactly  where  the  lines  terminate.  What  we  need  to  do  is 
express  our  errors  in  terms  of. the  distance  of  ore  line  from 
another,  rather  than  in  terms  of  the  error  in  the  locations 
of  points.  The  solution  we  have  adopted  is  to  measure  as 
our  errors  the  perpendicular  distance  of  each  endpoint  of  the 
model  line  from  the  corresponding  line  in  the  image,  and 
to  then  take  our  derivatives  in  terms  of  this  distance  rather 
than  in  terms  of  xf  or  y\  ^hts  specifies  mathematically  what 
we  want  to  say — that  the  model  line  chould  lie  on  top  of  the 
image  line  but  that  the  endpoints  need  not  correspond.  In 
order  to  express  the  perpendicular  distance  of  a  point  from 


a  lire  it  is  useful  to  first  express  the  line  as  an  equation  of 
the  following  form,  in  which  m  is  the  slcpe 


In  this  equation  d  is  the  perpendicular  distance  of  the  line 
from  the  origin.  If  we  substitute  some  point  (z1,  y')  into  the 
left  side  of  the  equation  and  calculate  the  new  value  of  d  for 
this  point  (call  it  d'),  then  the  perpendicular  distance  of  this 
point  from  the  line  is  simply  d  —  d1  What  is  more,  it  is 
easy  to  calculate  the  derivatives  of  d 1  for  use  in  the  conver¬ 
gence,  since  the  deri  /atives  of  d1  are  just  a  linear  sum  of  the 
derivatives  of  z 1  and  t/'  as  given  in  the  above  equation,  and 
we  already  know  how  to  calculate  the  z1  and  y1  derivatives 
from  the  solution  given  for  using  point  conespondences.  The 
result  is  that  each  line-line  correspondence  we  are  given  be¬ 
tween  model  and  image  gives  us  two  equations  for  our  linear 
system — the  same  amount  of  information  that  is  conveyed 
by  a  point-point  correspondence. 

6.  Implementation 

The  full  back-piojeution  method  and  extensions  de¬ 
scribed  above  have  been  implemented  as  a  module  of  the 
ACRONYM  system.  The  algorithm  has  performed  well  ?  nd 
usually  converges  to  the  correct  transform  and  parameter 
values  to  within  about  1  part  in  104  in  less  than  6  iterations. 
The  algorithm  is  implemented  in  compiled  MACLISP  run¬ 
ning  on  a  DEC  KL-H),  and  each  iteration  executes  in  less 
than  20  milliseconds. 

Ti?n  pictures  on  the  previous  page  were  produced  by 
successive  iterations  towards  a  solution  for  fitting  a  simple 
airplane  model  to  a  few  lines  in  a  simulated  image.  The 
airplane  model  was  parameterized  so  that  the  wings  could  be 
swept  back  and  forth,  and  the  algorithm  was  able  to  deter¬ 
mine  the  airplane’s  orientation  and  the  degree  to  which  the 
wings  were  swept  back  from  knowledge  of  the  model's  sym¬ 
metry. 

We  have  not  yet  fully  integrated  this  pro^ss  with  the 
rest  of  tile  ACRONYM  system,  but  it  is  fairly  clear  how 
most  of  this  integration  can  be  accomplished.  Once  a  good 
approximation  to  the  camera  transform  has  been  obtained, 
she  model  provides  strong  constraints  on  where  to  look  for 
line  segments  corresponding  to  other  parts  of  the  model.  A 
simple  approach  is  to  make  un  only  of  predictions  which 
predict  a  line  segment  at  a  specific  location  in  the  image,  and 
to  examine  the  image  for  the  instantiation  of  this  prediction 
within  srr^H  error  bounds.  A  much  more  general  approach 
would  be  to  integrate  the  back-projection  process  with  the 
geometric  reasoning  and  matching  components  of  Acronym, 
so  that  the  constraints  propagate  throughout  the  high-level 


127 


<> 


predictions.  An  important  aspect  of  this  further  process¬ 
ing  would  be  techniques  for  discarding  wild  points  in  the 
the  original  matches,  and  reconverging  to  the  best  fit  of  the 
remaining  matches.  There  should  not  be  much  difficulty  in 
finally  deciding  whether  a  model  matches  at  a  certain  loca¬ 
tion  in  the  image,  since  the  three-dimensional  model  makes 
so  many  specific  predictions  about  the  appearance  of  the 
image. 

7.  Conclusion. 

The  use  of  back-projection  in  conjunction  with  view¬ 
point  independent  models  should  enable  a  computer  vision 
system  to  make  effective  use  of  the  valuable  constraints  and 
predictions  embodied  in  such  models.  It  is  interesting  to 
note  some  apparent  similarities  between  this  use  of  models 
and  the  results  of  psychological  experiments  in  human  vision. 
The  mental  rotation  of  images  and  the  role  ol  expectations 
in  perception  are  both  active  research  areas  in  psychology. 
Shepard  and  Metzler  [8]  showed  that  the  length  of  time  sub¬ 
jects  took  to  compare  two  pictures  of  a  three-dimensional  ob¬ 
ject  was  directly  proportional  to  the  three-dimensional  angle 
of  rotation  between  them.  The  classic  work  of  Leeper  [5] 
showed  that  people  could  often  instantly  identify  degraded 
pictures  when  given  an  identifying  labei  aLer  being  unable  to 
make  any  sense  of  the  picture  when  examining  it  without  the 
label.  In  examples  such  as  these,  or  in  some  cases  where  only 
simple  line  data  is  available,  it  appears  that  recognition  is 
crucially  dependent  on  expectations  derived  from  some  sort 
of  prior  visual  representation,  and  little  can  be  done  if  we  are 
limited  to  information  derived  bottom-up  from  the  image. 


Refei  ’nces 

|l]  Binford,  Thomas  0.,  "Visual  Perception  by  Computer,” 
Invited  paper  at  IEEE  Systems  Science  and  Cyber¬ 
netics  Conference,  MJfcmi,  Dec.  1971. 

[2]  Brooks,  Rodney  A.  and  Thomas  0.  Binford,  "Represent¬ 

ing  and  Reasoning  about  Partially  Specified  Scenes,” 
Proc.  ARPA  Image  Understanding  Workshop,  Balti¬ 
more,  Apr.  1980. 

[3]  Brooks,  Rodney  A.,  Russell  Greiner  and  Thomas  0. 

Bin  lord,  "A  Model-Based  Vision  System,”  Proc . 
ARPa*  Image  Understanding  V/orkshop,  Cambridge, 
May  1978,  36-44. 

[4]  Brooks,  Rodney  A.,  Russell  Greiner  and  Thomas  0. 

Binford,  "The  ACRONYM  Model-Based  Vision  Sys¬ 
tem,”  Proc.  oj  IJCAI-79t  Tokyo,  Aug.  1979,  105- 
113. 

[5]  Leeper  R.,  “A  study  of  a  neglected  portion  of  learning — 

the  development  of  sensory  organization.”  Journal 
of  Genetic  Psychology,  1935,  46:41-75. 

[6]  Rogers,  David  and  J.  Alan  Adams,  Mathematical  Elements 

for  Computer  Graphics,  McGraw-Hill,  1976. 

[7]  Salamin,  Gene,  “Application  of  Quaternions  to  Comput¬ 

ation  with  Rotations”  Internal  working  paper,  Stanford 
Artificial  Intelligence  Laboratory,  1979. 

[8]  Shepard  R.  N.  and  J.  Metzler,  “Mental  Rotation  of  Three- 

Dimensional  Objects,”  Science,  1971,  171:701-703. 


128 


ASPKCTS  01  A  COMPUTATIONAL 
niKOKV  OF  HUMAN  STERKO  VISION 

\\.E.L.  Crimson 


MIT  Artific  ial  Intelligence  Laboratory 
545  Technology  Square,  Cambridge  Ma  0213a  USA 


ofhumansicreo  WrinT a^’  Ma.rrand  Po8Si° (!979)  Presented  a  theory 
and  consists  of  fire  steps;  ("  ^THc  Ic ft  and^iclit Ulat  isprcscnted> 

with  masks  of  four  sis  that  incrcisc  i  irh  T?"®  Cach  filtcrcd 
these  masks  is  given  by  V'2C  die  hnin  •  *  '■tccnInL'II>';  11,0  shape  of 

merits  Sal"  lrwl„TT  "P  "  ®“  «“  *m  of  *, 

» «■« - . . S; 

rcspondcnce  is  achieved  if  k  ,  ul,?n*  (5)  Whcn  a  c°r- 

” supported  by  compart  JSS  ”*  il  £t?£g’ 


1.  Introduction 

If  two  objects  arc  separated  in  depth  from-a  viewer,  then  the  rela¬ 
tive  positions  of  their  images  will  differ  in  the  two  eyes  Tlds  differ  nee 
m  relative  positions  -  die  disparity  -  m,v  k  difference 

estimate  (tenth  n,  P  Iy  ~  may  be  measured  and  used  to 

Zl  pro“*  SIcrco  vilioni  in  csscncc-  — ■««  ws 

scene.  'PUIC  dCpUl  lnfunnation  for  sitrfaces  in  the 

im'fsnTnlFT  T  mC3*infl  <»*“'  -'J  Poggio. 

y)'  '  a  parucular  location  on  a  surface  in  rl-.n  cmn«  *  . 

selected  from  one  image;  (S2)  that  same  taaltai  mu«  teuZte*  S 

fte  other  image;  and  (S3)  the  disparity  between  the  two  con^Lg 

image  points  must  be  measured.  The  difficult,  of  flu*  p^rr  s  J 

-  ffi  so  Id  0131  i$’ in  n,atCh,ng  **»  ofth;  same  oc^on 

^^t:7°ndCT  Pr0blCm'  F°r  ^  CaSC  of  d”  human 
in  dv,  V  !  l~  f‘urt,‘ tt,a‘  *IS  matching  takes  place  very  earlv 

“Z  :£::z,  r  p”r  ,o  *n'  “««*» «" 

by  ita'cSZlcTS^  T'"  °f  ““  “  ™>  «  -UusuwJ 

'  of  r'"d0,n  *»  P«‘»™  Jute  0960)  demonstrated 


1“  "d°m  *“  *»  *«d  nmnocularly 

r  i  10  SC  up  a  correspondence  between  two  arraw  nf 

£^^fcr:s:3S 

•cSS Sbfc  toS  “r  mnnoc trier  or  high  level  cues 

p»«o»ap,„bjmTh;itss»r»,si,c",; 

at  onjeran  SScT'ia  y"d  C"'  ^  "‘'r'  anrafics  to  i.iiensily 

I  "tticrem  scales.  (2)  Zcnraosslnss  die  lllierctl  Imam,  ,„c  r„ln, 

lbeSJ,“"‘lnc?'c"l°n1  l"“,l,“S  l'C"’"Jl“"ar  ®  toe  oricnptiloe  of 
performing  a  f  T  ’  “ 

r— 

WfESST  "  ■l“,P  Cl""“*  to'  «WI  imensit!  fueerl.a 
.»«“  of  d  e  ,  '  Wes  piece  bet.eeo  reroetLag 

toape.  .  J!'  ”5'?'^  ro"e1'y  dl'  »»»  orieaofion  in  Uire  In 

ee»f,.,r,e,  o,  wmltC'"  "  “  **  *  *"""  «»  «*V 

ijsssiiilil 


129 


dmils  of  die  dieory  explicit.  This  often  uncovers  pre\  iously  overlooked 
difficulties,  thereby  guiding  further  refinement  of  die  theory. 

A  second  benefit  concerns  die  performance  of  die  implementation. 
Any  proposed  model  of  a  system  must  be  testable.  In  diis  case,  by 
testing  on  pairs  of  stereo  images,  one  can  examine  die  performance 
of  die  implementation,  and  hence  of  die  dieory  itself,  provided,  of 
course,  dial  die  implementation  is  an  accurate  representation  of  dial 
dieory.  In  ibis  manner,  the  performance  of  die  implementation  can  be 
compared  with  human  performance.  If  the  algorithm  differs  strongly 
from  known  human  performance,  its  suitability  as  a  biological  model  is 
quickly  brought  into  question  (c.f.  die  cooperative  algorithm  of  Marr 
and  Poggio  (1976)). 

‘Hits  article  describes  an  implementation  of  die  Marr-Poggio  stereo 
dieory,  written  with  particular  emphasis  on  the  matching  process  (Grnnson 
and  Marr,  1979).  for  details  of  die  derivation  and  justification  of  die 
theory,  see  Marr  and  Poggio  (1979). 

The  first  part  of  this  paper  describes  the  overall  design  cf  die  im¬ 
plementation.  Several  examples  of  the  implementation's  performance 
on  different  images  are  then  discussed,  including  random  dot  stcrcogiams 
from  die  human  sicrcopsis  literature  such  as  with  one  image  defocussed, 
noise  introduced  into  part  of  die  images  spectra,  and  so  forth.  It  is 
shown  diat  the  implementation  behaves  in  a  manner  similar  to  humans 
on  these  special  eases.  Thirdly,  the  theory  makes  some  statistical  as¬ 
sumptions;  these  are  compared  with  the  actual  statistics  found  in  prac¬ 
tice.  f  inally,  the  results  of  miming  die  program  on  some  natural  images 
arc  shown. 

2.  Design  of  the  program 

The  implementation  is  divided  into  five  modules,  roughly  cor¬ 
responding  to  die  five  sicps  in  die  summary  above.  These  modules,  and 
die  fid  tv  uf  mfui  matloti  between  diem,  we  filiwurated  in  Figure  L  Each 
of  the  components  is  described  in  turn. 

2.1  Input 

There  arc  two  aspects  of  the  human  stereo  system,  embedded  in 
die  Marr-Poggio  dieory,  which  must  be  made  explicit  in  die  input  to 
die  algoridim.  The  first  is  die  position  of  die  eyes  with  respect  to  die 
scene,  as  eye  movements  will  be  critical  for  obtaining  fine  disparity 
fcritaiAtan.  tbv  weofcj  If  lK-  t  to  mdMdtt  ilFtijiljM  of  the 
image  widi  increasing  eccentricity. 

To  account  for  these  effects,  die  algoridim  maintains  as  its  ini¬ 
tial  input  a  stereo  pair  of  images,  representing  the  entire  scene  visible 
to  die  viewer.  This  pair  of  images  corresponds  to  the  environment 
around  the  visual  system,  rather  than  some  integral  part  of  the  sys¬ 
tem  itself.  To  create  diis  representation  of  die  scene,  natural  images 
were  digitized  on  an  Optronix  Photoscan  System  P1000.  The  sizes 
of  these  images  are  indicated  in  the  legends,  Grey-level  resolution  is 
8  bits,  providing  256  intensity  levels.  For  the  random  dot  patterns 
illustrated  in  diis  article,  the  images  were  constructed  by  computer, 
rather  than  digitized  from  a  photograph. 

For  a  given  position  of  die  eyes,  reladvc  to  the  scene,  a  repre¬ 
sentation  of  die  images  on  the  two  retinas  is  extracted.  The  algoridim 
creates  diis  retinal  representation  by  obtaining  a  second,  smaller  pair 
of  images  from  die  images  representing  the  whole  scene.  The  map¬ 
ping  from  die  scene  images  into  the  retinal  images  accounts  for  the 
tWu  ftiJTutT  i.ffTwTUit  hr  Mmui?.  fim,  na¬ 


tions  of  die  scenes  will  be  mapped  io  the  center  (fou\i)  of  the  retinal 
images  as  die  position*  of  die  eyes  arc  \aricd.  Since  the  matching 
process  will  take  place  on  die  array  representing  die  retinal  images, 
it  is  iniportnn;  tli.i  the  coordinate  systems  of  those  arrays  coincide 
with  the  current  positions  of  die  eyes.  Note  dial  the  portion  of  the 
scene  image  which  is  mapped  into  the  reiin.il  image  may  differ  for 
die  two  eyes,  depending  on  the  relative  positions  of  die  two  optical 
axes.  In  particular,  there  may  be  differences  in  vertical  alignment  as 
well  as  in  horizontal  alignment.  Second,  die  Mnir- Poggio  dieory  also 
states  that  die  resolution  of  the.  earlier  stages  of  the  algorithm  —  the 
convolution  and  zero-crossings  ~  scales  linearly  with  eccentricity. 
The  most  convenient  met  hod  for  dealing  with  this  faci  is  to  account 
for  die  scaling  wiih  eccentricity  at  the  level  of  the  extraction  of  the 
images.  This  means  dial  rather  than  extracting  a  set  of  retinal  images 
in  a  linear  manner,  we  may  map  the  scene  imo  the  retinal  images 
by  a  mapping  whose  magnification  varies  with  eccentricity.  By  so 
doing,  the  later  stages  of  processing  need  not  explicitly  account  for 
the  variation  with  eccentricity.  Rather,  dicse  processes  are  considered 
as  operating  on  a  uniform  grid.  Note  dial  this  eccentric  mapping 
is  not  essential,  especially  foi  small  images.  In  most  of  the  eases 
illustrated  in  this  article,  die  mapping  was  not  used. 

After  die  completion  of  this  stage,  the  implementation  has 
created  a  representation  of  the  images  diat  has  accounted  for  eye 
position  and  for  retinal  scaling  with  eccentricity.  For  each  pass  of  die 
algoridim,  the  matching  will  take  place  on  die  representation  of  the 
retinal  images,  diereby  implicitly  assuming  some  particular  eye  posi¬ 
tions.  Once  the  matching  has  been  completed,  die  disparity  values 
obtained  may  be  used  to  change  die  positions  of  the  two  optic  axes, 
dms  causing  a  new  pair  of  retinal  images  to  be  extracted  from  die 
representations  of  die  scene,  and  die  matching  process  may  proceed 
again. 

2.2  Convolution  j 

Given  the  retinal  representations  ofthc  images,  it  is  dicn  neces¬ 
sary  to  transform  diem  into  a  form  upon  which  the  matcher  may 
operate.  Marr  and  Poggio  (1979)  argued  dial  the  items  to  be  matched 
in  an  image  must  be  in  one-to-one  correspondence  with  well-defined 
locations  on  a  physical  surface.  This  led  to  die  use  of  image  predi¬ 
cates  which  correspond  to  changes  in  intensity.  Since  these  intensity 
changes  am  uccnr  </rer  a  vnde  tftfige  * f  kMk.  wdvH  *  sKwd  F  , 
they  arc  detected  separately  at  different  scales.  Ill  is  is  in  agreement 
with  the  findings  ot  Campbell  and  Hobson  who  snowed  diai 

visual  information  is  processed  in  parallel  by  a  number  of  independ¬ 
ent  spaual-frequcncy-tuncd  channels,  and  with  die  findings  of  Julesz 
and  Miller  (1975)  and  Mayliew  and  Frisby  (1976),  who  showed  that 
spatial-frequcncy-tuncd  channels  arc  used  in  stcrcopsis  and  are  in¬ 
dependent.  Recent  work  by  Wilson  and  Bergen  (1979)  and  Wilson 
and  Giese  (1977)  provided  evidence  for  the  particular  form  of  these 
spatial-frcqucncy-tuncd  operators.  Measuring  contrast  sensitivity  to 
vertical  line  stimuli,  Wilson  and  his  collaborator  showed  diat  the 
image  is  convolved  with  an  operator  which  in  one  dimension  may 
be  closely  approximated  by  a  difference  of  two  gaussian  functions 
(DOG), 

In  the  original  dieory  (Marr  and  Poggio,  1979),  the  proposed 
masks  were  oriented  bar  masks  whose  cross-section  was  a  difference 
of  two  gaussians,  as  given  by  die  Wilson  and  Bergen  data.  If  an  inten¬ 
se  MltMQp  yeeUft  *|or £  p  p*  titular  orientation  jn  foe  image,  there 


130 


'Vill  be  a  peak  in  the  first  directional  derivative  of  ioicnsitv,  and  a 
zero-crossing  in  die  second  directional  derivative.  Thus,  die  intensity 
changes  in  the  image  can  be  located  b>  finding  zero-crossings  in  the 
output  of  a  second  directional  derivative  operator.  However  a  num¬ 
ber  of  practical  considerations  have  led  Marr  and  Hildreth  (1979) 
to  suggest  that  the  initial  operators  not  be  directional  operators. 
Hie  only  non-diiecti.mal  linear  second  derivative  operator  is  the 
l.aplacun.  Marr  and  Hildreth  have  shown  that  provided  two  simple 
conditions  on  the  intensity  function  in  the  neighbourhood  of  an  edge 
are  satisfied,  die  zero-crossings  of  the  second  ducetional  derivative 
taken  perpendicular  to  an  edge  will  coincide  with  die  zero-crossings 
of  the  l.aplactan  .along  dial  edge,  ’Ilicrcforc,  theoretically,  we  can 
detect  intensity  changes  occuring  at  all  orientations  using  the  single 
noil-oriented  l.aplacian  operaior.  Thus.  Marr  and  Hildreth  propose 
that  intensity  changes  occuring  at  a  particular  scale  may  be  detected 
by  locating  the  zcro-ci ovmgs  in  die  output  of  V*C.  die  Taplacian 
of  a  gaiissian  distribution.  The  operator.'  together  with  its  fonrier 

transform,  is  illustrated  in  f  igure  2.  I  he  form  of  die  operator  is  given 
by: 


VJG'M  = 


2-  • 


Given  die  form  of  die  operators,  it  is  only  left  to  determine 

T°  d0  ,his'  ^  note  that  Marr  and 
Uddreth  (1979)  showed  that  die  operator  V2C  is  a  close  approxima¬ 
tion  lo  die  DOG  function.  Wilson  and  Bergen’s  data  indicated  DOG 
filters  whose  sizes  -  specified  by  the  width  w  of  die  filter’s  central 
excitatory  region  -  range  from  3.1’  to  21’  of  visual  arc.  Hie  variable 
w  is  related  to  the  constanta  of  V2C  by  the  relation: 

a  —  W 
2\/2 

Wilson  and  Bergen’s  values  were  obtained  by  using  oriented  line 
sumuh.  To  obtain  the  diameter  of  die  corresponding  circularly  sym¬ 
metric  cemer-surround  receptive  field,  die  values  of  w  must  be  muld- 
phed  by  v  2.  finally,  we  want  the  resolution  of  the  initial  images  to 
rough  y  represent  die  resolution  of  processing  by  die  cones,  and  the 
x  of  the  filters  to  represent  the  size  of  die  retinal  operators.  In  the 
t.  densely  packed  region  of  the  human  fovea,  the  ccntcr-to-entcr 
Wg  dftfre  .ones  ,s  2.0  to  l.s  m,  corresponding  to  an  angular 
-pacing  of  25  lo  29  arc  seconds  (O’Brien,  1951).  Accounting  for  the 
conversion  of  Wilson  and  Bergen’s  data,  and  using  die  figure  of  27 
seconds  of  arc  for  the  separation  of  cones  in  the  fovea,  one  arrives  at 
values  of  w  m  die  range  9  to  63  image  elements,  and  hence,  values  of 
°  range  3  to  23  image  elements, 

1Q701  uT'r'  *“  has  bccn  Pr°P«cd  (Marr.  Poggio  and  Hildreth, 
1979)  diat  a  fiirthcr.  smaller  channel  may  be  present.  This  channel 
would  have  a  central  cxcilatory  width  of  u,  =  1.5’,  roughly  cor¬ 
responding  to  4  image  elements. 

p  The  present  implementation  uses  four  filters,  each  of  which  is  a 
radially  symmetric  difference  of  gaussians,  with  w  values  of  4, 9, 17  and 
image  e  cincnts.  Die  coefficients  of  dir  f  ilers  were  represented  to  «, 
precision  of  1  part  in  2048.  Coefficients  of  less  dian  J™’th  of  the  max- 
mnim  value  of  the  mask  were  set  to  zero.  Tims,  die  truncation  radius 
of  die  mask  (die  point  at  which  all  further  mask  values  were  treated  as 
zero)  was  approximately  1.8w,  or  equivalently,  0.68a. 

fly  *fU f,iJ  ftdt  ptrfbnnctf  on  a  Lib**  machine 


constructed  at  me  Mil  Anifici.il  Intelligence  laboratory,  using  ad- 
thuona  hardware  spec, ally  designed  fur  die  purpose  (Knight,  ct  al. 

i  )•  1  igures  3  and  -!  illustrates  some  images  and  their  convolutions 
uitii  various  sized  masks. 

After  die  completion  of  this  stage  of  die  algorithm,  one  has 
four  filtered  copies  of  each  of  the  images,  each  copy  having  been 
con \ oh ed  with  a  different  size  mask. 


13 1 'ejection  and  description  of  zero-crossings 

According  to  die  Marr-Poggio  dieory.  die  element  that  are 
mat. lied  between  unages  arc  (i)  zero-crossings  whose  orientations 

Ireneem  T“,Ua'’  T  ^  tCmimions-  “act  definition  and 
benee  die  detection  of  terminations  is  at  present  uncertain;  as  a  con¬ 
sequence,  only  zero-crossings  are  used  as  input  to  die  matcher, 

Since,  for  the  purpose  of  obtaining  disparity  information,  we 
•>  gnorc  hoi  Mentally  oriented  segments,  die  detection  of  zero- 
ctossmgs  can  be  accomplished  by  scanning  the  convolved  image 
nzonta l  y  for  adjacent  elements  of  opposite  sign,  or  for  dime 
horizontally  adjacent  elements,  die  middle  tone  of  which  is  zero  the 
o  her  two  containing  convolution  values  of  opposite  sign.  This  gives 
the  position  of  zero-crossings  to  within  an  image  element. 

in  addition  to  d.cir  location,  we  record  the  sign  of  die  zero- 
ossmgs  (whether  convolution  values  change  from  positive  to  nega¬ 
te  or  negative  In  ,.u*ht  w  ir.tr*  fr.,..,  l0  ngm;  and  ,  ® 

estimate  of  the  local,  two-dimensional  orientation  of  pieces  of  to 
zero-cross, ng  eon, our.  In  die  present  implementation,  the  orientadon 
point  on  a  zero-crossing  segment  is  computed  as  the  direchon 
of  the  gradient  of  die  convolution  values  across  that  segment,  and 
recorded  ,n  increments  of  30  degrees.  Figures  5  and  6  illustrate  zero- 

4  T*  lmCd UllS  Wa>'  fr°m  thC  convo,ud«ns  of  Figures  3  and 
zero-crossings  arc  shown  white,  and  negad  -e  crossings 

for  each  sizcof  mask^  a^,°SSln8  *  Mch  ^  - 


14 

Die  matcher  implements  the  second  of  die  matching  algo- 
rithms  described  by  Marr  and  Poggio  (1979,  p.315).  For  each  size  of 
filter,  matching  consists  of  6  steps: 

(1)  Fix  the  eye  positions. 

(2)  Locate  a  zero-crossing  in  one  image. 

(3)  Divide  the  region  about  die  corresponding  point  in  the  second 
image  into  three  pools, 

(4)  Assign  a  match  to  die  zero-crossing  based  on  the  potendal 
matches  within  the  pools. 

(5)  Disambiguate  any  ambiguous  matches. 

(1)  Assign  the-  disparuy  values  to  a  butter. 

These  steps  may  be  repeated  several  dmes  during  die  fusion  of 
an  image.  Given  a  position  for  die  optic  axes,  these  matching  steps 
arc  performed,  with  the  results  stored  in  a  buffer.  These  results  may 
be  used  to  refine  the  eye  positions,  causii*  a  new  set  of  niftfal 
to  be  extracted  from  the  scene,  and  die  matching  steps  arc  performed 
again. 


We  now  expand  upon  cadi  of  tlic  six  steps  of  the  matching 
process.  The  first  step  consists  of  fixing  'he  two  eye  positions,  fhc 
alignment  between  the  two  zero-crossing  descriptions,  corresponding 
to  the  positions  of  the  optica)  axes,  is  determined  in  two  ways.  ‘Hie 
initial  offsets  of  the  descriptions  arc  arbitrarily  set  to  zero.  llicrcafter, 
die  offsets  of  die  two  optical  axes  are  determined  by  accessing  the 
current  dispaiity  values  for  a  region  and  using  dicsc  values  to  adjust 
die  vcrgencc  of  die  eves.  In  diis  implementation,  Uiis  is  done  by 
modifying  the  extraction  of  the  retinal  images  from  the  images  of  the 
entire  scene,  accounting  for  die  positions  of  die  optical  axes. 

Once  the  eye  positions  have  been  fixed,  and  the  retinal  images 
extracted,  die  images  arc  convolved  with  the  DOG  fillers,  and  the 
zero-crossing  descriptions  arc  extracted  from  die  convolved  images. 
For  a  zero -crossing  description  corresponding  to  a  particular  mask 
size,  the  matching  is  performed  by  locating  a  zero-crossing  and  ex¬ 
ecuting  the  following  operation.  Given  die  location  of  a  zero- crossing 
in  one  image,  a  horizontal  region  about  die  same  location  in  the  other 
image  is  partitioned  into  dirce  pools.  These  pools  form  the  region 
to  be  searched  for  a  possible  matching  zero-crossing  and  consist  of 
two  larger  convergent  and  divergent  regions,  and  a  smaller  one  lying 
centrally  between  them.  Together  these  pools  span  a  disparity  range 
equal  to  2 w,  where  w  is  die  width  of  the  ccntial  excitatory  region  of 
the  corresponding  two-dimensional  convolution  mask. 

The  following  criteria  arc  used  for  matching  zcro-cr^sings  in 
the  left  and  right  filtered  images,  for  each  pool: 

(1)  die  zero-crossings  must  come  from  convolutions  with  the  same 

size  mask. 

(2)  die  zero-crossings  must  have  the  same  sign. 

(3)  die  zero-crossing  segments  must  have  roughly  die  same  orienta¬ 
tion. 

A  match  is  assigned  on  die  basis  of  die  number  of  pools  con¬ 
taining  a  matching  zero-crossing,  if  exactly  one  zero-crossing  of  the 
appropriate  sign  and  orientation  (within  30  degrees)  is  found  within 
a  pool,  die  location  of  dial  crossing  is  transmitted  to  die  matcher.  If 
fWo  CariuU J jiv,  ZvTi/  crossings  arc  found  wltThTi  uix  jjogT  (an  uuhVeTy 
event),  die  matcher  is  notified  and  no  attempt  is  made  to  assign  a 
match  for  die  point  in  quesdon.  If  die  matcher  finds  a  single  cross¬ 
ing  in  only  one  of  the  dirce  pools,  diat  match  is  accepted,  and  die 
disparity  associated  with  die  match  is  recorded  in  a  buffer.  If  two  or 
tin  ex  oi  the  puuhs  cOliiatii  a  candidate  match,  die  aigoriinm  records 
dint  information  for  future  disambiguation. 

Once  all  possible  unambiguous  matches  have  been  identified, 
an  attempt  is  made  to  disambiguate  double  or  triple  matches.  This 
is  done  by  scanning  a  neigh  irhood  about  die  point  in  quesdon. 
and  recording  die  disparity  sign  of  die  unambiguous  matches  within 
diat  neighbourhood,  (Disparity  sign  refers  to  the  sign  of  die  pool 
from  wiiiu,  me  match  comes:  divergent,  convergent  or  zero.;  it  me 
ambiguous  point  has  a  potential  match  of  the  same  disparity  sign  as 
die  dominant  type  within  die  neighbourhood,  then  that  is  chosen  as 
the  match  (this  is  the  "pulling”  effect).  Odicrwisc,  die  match  at  that 
point  is  Hi  am!  igu&Lis. 

There  is  die  possibility  diat  die  region  under  consideration 
docs  not  lie  within  the  disparity  range  handled  by  die  matcher. 
TVs  situation  b  (fewc'iccf  aim  handled  by  the  Mowing  operation. 
Consider  the  ease  in  which  die  region  docs  lie  within  die  disparity 


range  .  Fxelmling  die  case  of  occluded  points,  every  zero- 
crossing  in  die  region  will  have  at  least  one  candidate  match  (die 
correct  one)  in  the  other  filtered  image.  On  the  other  hand,  if  die 
region  lies  beyond  die  disparity  range  ±w.  then  the  probability  of  a 
given  zero-crossing  having  at  least  one  candidate  match  will  be  less 
Uian  1  in  fact.  Marram!  IMggtosbow  that  the  probability  of  a  zero- 
crossing  having  at  least  one  candidate  match  in  this  ease  is  roughly 
0.7.  We  can  perform  die  following  operation  in  this  ease.  l;ora  given 
eye  position,  the  matching  algorithm  is  rim  for  all  die  zero-crossings. 
Any  crossing  for  which  there  is  no  match  is  marked  as  such.  If  the 
percentage  of  matched  points  in  any  ayon  is  less  than  a  threshold  of 
0  7  dien  the  region  is  declared  to  be  out  of  range,  and  no  disparity 
values  arc  accepted  for  that  region. 

he  overall  clfccl  of  the  matching  process,  as  drivers  from  die 
left  image,  is  to  assign  disparity  values  to  most  of  me  zero -crossings 
obtained  from  die  left  image.  An  example  of  the  output  appears 
in  Figure  7.  In  diis  array,  a  zero-crossing  at  position  (x,  i/)  with 
associated  disparity  d  has  been  placed  in  a  three-dimensional  array 
with  coordinate  (x,  y,  d).  l  or  display  purposes,  die  array  is  shown  in 
die  figures  as  viewed  from  a  point  some  distance  away.  The  heights  in 
die  figure  correspond  to  the  assigned  disparities. 

After  completion  of  diis  stage  of  die  implementation,  wc  have 
obtained  a  disparity  array  for  each  mask  size.  The  disparity  values 
are  located  only  along  die  zero-crossing  contours  obtained  from  that 
mask. 

2.5  Vcrgencc  Control 

'liie  Marr-foggio  theory  states  that  in  order  to  obtain  fine 
resolution  dispaiity  information,  it  is  necessary  that  the  smallest 
channels  obtain  a  matching.  Since  die  range  of  disparity  over  which 
a  channel  can  obtain  a  match  is  directly  proportional  to  die  size  of 
tlie  channel,  this  itiCuiiS  dial  ihe  positions  of  the  eyes  must  be  as¬ 
signed  appropriately  to  ensure  that  the  corresponding  zero-crossing 
descriptions  from  die  two  images  arc  within  a  matchablc  range.  The 
disparity  information  required  to  bring  die  smallest  channels  into 
iJicw  nwhUhlr-H  gr  ii  pruiJoL  Fy  rtia  bqjor  cIugiiv**.  i«,  'f 
a  region  of  die  image  is  declared  to  be  oui  of  range  of  fusion  by  the 
smaller  channels,  one  can  ficqucntly  obtain  a  rough  disparity  value 
for  dial  region  from  the  larger  channels,  and  use  diis  to  verge  the 
eyes.  In  diis  way,  the  smaller  channels  can  be  brought  into  a  range  of 
correspondence. 

Thus,  after  the  disparities  from  die  different  channels  have 
been  combined,  there  is  a  mechanism  for  controlling  vcrgencc  move¬ 
ments  of  die  eyes.  T  his  operates  by  searching  for  regions  of  die  image 
which  do  not  have  disparity  values  for  die  smallest  channel,  but 
which  do  have  disparity  values  for  die  larger  channels.  These  large 
channel  values  arc  used  to  provide  a  refinement  to  die  current  eye 
puSlliuirt,  tbeirb}  bringing  wnwlftt  ehwrtwfc  fortw  Tjwgs  uTm- 
rcspondcncc.  Two  possible  mechanisms  for  extracting  die  disparity 
value  from  a  region  of  the  image  include  using  the  peak  value  of  a 
histogram  of  die  disparities  in  that  neighbour  I  mod,  or  using  a  local 
a vc rare  of  the  disparity  values.  In  the  current  implementation,  the 
search  for  such  a  region  proceeds  outwards  from  die  fovea. 

It  should  be  noted  here  diat  although  die  use  of  disparity  in- 
fufi.mtiuu  bain  maiar  clKnmds-tu  ilmc  ^yv  jriuvodiim,  allying 
smaller  channels  to  come  into  correspondence,  is  a  necessary  condi* 


&Sk*s8 


132 


[ 


z 


uon  of  the  Marr-Poggio  theory,  it  is  not  necessarily  the  only  such 
condition.  In  other  words,  there  may  be  other  modules  of  the  visual 
system  which  can  initiate  eye  movements,  and  ihercb)  affect  the 
input  to  die  matching  component,  by  altering  the  retinal  images 
presented  to  the  matcher.  An  example  of  this  would  be  die  evidence 
of  Kidd  et  al.  (1979)  concerning  the  ability  of  texture  contours 
to  facilitate  stereopsis  by  initiating  eye  movements.  However,  such 
effects  are  somewhat  orthogonal  to  the  question  of  the  sufficiency 
of  the  matching  component  of  the  Marr-Poggio  theory,  since  they 
affect  the  input  to  the  matcher,  but  no.  the  actual  perfonnance  of  the 
matching  algorithm  itself. 

2.6  The  2  £ -Dimensional  Sketch 

Once  the  separate  channels  have  performed  their  matching,  the 
results  are  combined  and  stored  in  a  buffer,  called  the  2^-D  sketch. 
There  are  several  possible  methods  for  accomplishing  this.  As  far 
as  the  Marr-Poggio  theory  is  concerned,  the  important  point  is  that 
some  type  of  storage  of  disparity  information  occurs.  (Perhaps  the 
strongest  argument  for  this  is  the  fact  that  up  to  2  degrees  of  disparity 
can  be  held  fused  in  th  e  fovea.) 

We  shall  outline  two  different  possibilities  for  the  combination 
of  the  different  channels.  The  method  currently  used  in  the  im¬ 
plementation  will  be  described  below,  A  more  biologically  feasible 
method  will  be  outlined  in  the  discussion. 

One  of  the  critical  questions  concerning  the  fonn  of  the  2£- 
D  sketch  is  whether  it  reflects  the  scene  or  the  retinal  images.  For 
all  tlie  cases  illustrated  in  this  article,  the  sketch  was  constructed  by 
directly  relating  the  coordinates  of  the  sketch  to  the  coordinates  of 
tine  images  of  the  entire  scene.  That  is,  ns  disparity  information  was 
obtained,  it  was  stored  in  a  bulfcr  at*the  position  corresponding  to 
tlie  position  tn  'he  original  scene  from  which  the  underlying  zero- 
crossing  came.  Since  disparity  inf  otthIu  fi  jUuI  Ihe  see m  if  ex¬ 
tracted  from  several  eye  positions,  in  order  to  store  this  information 
into  a  buficr,  explicit  information  about  the  positions  of  the  eyes  is 
Ln  will  !*■  dflTOKk  i  U.at  fbfc  h  probably 

inappropriate  as  a  model  of  tlie  human  system.  However,  for  the 
purposes  of  demonstrating  the  effectiveness  of  the  matching  module, 
such  a  representation  is  sufficient. 

The  actual  mechanism  for  storing  the  disparity  values  requires 
some  combination  of  the  disparity  maps  obtained  for  each  of  the 
channels.  Currently,  tlie  sketch  is  updated,  for  each  region  of  the 
image,  by  writing  in  tlie  disparity  values  from  the  smallest  channel 
w.iich  is  within  range  uf  fusion,  Vejgefice  movements  are  possible 
in  order  to  bring  smaller  channels  into  a  range  of  matching  for  some 

Jfoihw,  to&um  rfgfemi  A (]«  to  jge  U  whkh\txfU,  Amt 
channels  can  find  matches,  modification  of  the  eye  positions  over  a 
scale  larger  than  that  of  die  vergenee  movements  is  possible.  By  this 

method,  one  ean  attempt  to  bring  those  regions  of  the  image  into  a 
range  of  fusion. 

There  are  sevcH  possIbUUfcs  fUr  c_U  method  of  driving 
the  vergenee  movements.  Two  of  these  were  outlined  in  the  previous 
section. 

The  final  output  of  the  algorithm  consists  of  a  representation 
of  disparity  values  in  the  image,  those  disparities  being  restricted  to 
positions  in  the  imagj  Inn*  iLmg-  -wtiw 


2.7  Summary  of  tlie  process 

The  complete  algorithm,  ns  currently  implemented,  uses  four 
mack  sizes.  Initially,  die  two  views  of  die  scene  are  mapped  into 
a  pair  of  retinal  images.  Ill  esc  images  are  conv  olved  with  each 
mask.  The  zero-crossings  and  their  orientation  are  computed,  for 
each  channel  and  each  view.  The  initial  alignments  of  tlie  eyes  deter¬ 
mine  die  registration  of  the  images.  Tlie  matching  of  die  descriptions 
from  each  channel  is  performed  for  tin:  alignment.  Any  points  with 
eidier  ambiguous  matchings  or  with  no  match  are  marked  as  such. 

Next,  the  percentage  of  unmatched  points  is  checked,  for  all 
square  neighbourhoods  of  a  particular  size.  This  size  is  chosen  so  as 
to  ensure  diat  die  measurement  of  die  statistics  of  matching  within 
diat  neighbourhood  is  statistically  sound.  Only  die  disparity  points 
of  those  regions  whose  percentage  of  unmatched  points  is  below  a 
certain  dircshold,  determined  by  die  statistical  analysis  of  Marr  and 
Poggio  (1979).  are  allowed  to  remain.  All  other  points  are  removed. 
Tlie  values  which  are  kept  arc  stored  into  a  buffer.  At  diis  stage,  ver- 
gence  movements  may  tike  place,  using  information  from  the  larger 
channels  to  bring  the  smaller  channels  into  a  range  where  matching  is 
possible.  Furdier,  if  diere  are  regions  of  the  image  which  do  not  have 
disparity  values  at  any  level  of  channel,  an  eye  movement  may  take 
place  in  an  attempt  to  bring  those  portions  of  die  image  into  a  range 
where  at  least  the  largest  mask  can  perform  its  matching. 

Note  that  the  matching  process  takes  place  independently  for 
each  of  die  four  channels.  Once  die  matching  of  each  channel  is 
complete,  die  results  arc  combined  into  a  single  representation  of  the 
disparities. 

Hie  final  output  is  thus  a  disparity  map,  with  disparities  as¬ 
signed  along  most  portions  of  die  zero-crossing  contours  obtained 
from  the  smallest  masks.  ’Hie  accuracy  of  the  disparities  dins  ob¬ 
tained  depend:,  on  how  accurately  die  zero-crossings  have  been  local¬ 
ized,  which  may.  of  course,  be  to  a  resolution  much  finer  than  the 
initial  array  of  intensity  values  that  constitutes  the  image. 

3.  Examples  and  Assessment  of  Performance 

A  standard  tool  in  die  examination  of  human  stereo  perception  is 
the  random  dot  stereogram  (Julesz,  1960,  1971).  This  is  a  pair  of  stereo 
images  where  each  image,  when  viewed  monocularly,  consists  only  of 
randomly  distributed  dots,  yet  when  viewed  stcrcoscopically,  may  be 
fnsed  to  yield  patterns  separated  in  depth.  Such  patterns  are  a  useful 
tool  for  analysing  the  stereo  component  of  the  human  visual  system, 
since  there  are  no  visual  cues  oilier  dian  the  stereoscopic  ones.  We 
can  test  the  sufficiency  of  the  algorithm  by  comparing  human  pcncep- 
huii  with  Hit.  pc ifuj induce  of  die  algorithm  on  suen  patiems,  as  wed, 
since  random  dot  stereograms  have  well  demarked  disparity  values,  it 
is  easy  to  assess  the  correctness  of  the  algorithm’s  performance  on  such 
patterns. 

Table  1  lists  some  of  the  matching  statistics  for  various  random  dot 
pattern*  Th^  fflustrat&d  in  Figure  S-13  and  discussed  below. 

The  first  pattern  consisted  of  a  central  square  separated  in  depth 
from  a  second  plane.  The  pattern  had  a  dot  density  of  50%  and  its 
analysis  is  shown  in  Figure  7.  Each  dot  was  a  square  with  four  image 
elements  on  a  side.  For  the  algorithm,  this  corresponds  to  a  dot  of  ap¬ 
proximately  two  minute  jT -imal  an-  Tfuriei^  wa  >20  image 
elements  on  a  side.  The  central  plane  of  the  figure  was  shifted  12  image 


v? 


133 


ek  nents  in  one  image  relative  to  the  other.  The  final  disparity  map 
assigned  after  die  matching  of  the  smallest  channel  had  tl?  -*  following 
statistics.  The  number  of  zero-crossing  points  in  die  left  dcscripdon 
which  were  assigned  a  disparity  was  11847.  Of  diese  11847,  11830 
>vcrc  disparity  values  which  were  exaedy  correct,  and  an  additional  14 
deviated  by  one  image  clement  from  die  correct  \aluc.  Approximately 
0.03%  of  die  matched  points,  or  roughly  3  points  in  10000  were  incor¬ 
rectly  matched. 

A  similar  test  was  run  on  patterns  with  a  dot  density  of  25%,  10% 
and  5%.  The  results  arc  illustrated  in  Figure  8. 

For  each  of  these  cases,  die  number  of  incorrectly  matched  points 
was  extremely  low.  Iliosc  points  which  were  assigned  incorrect  dis¬ 
parities  all  occurcd  at  the  border  between  the  two  planes,  diat  is,  along 
the  discontinuity  in  disparity. 

A  more  complex  random  dot  pattern  consisted  of  a  wedding  cake, 
built  from  four  different  planar  layers,  each  separated  by  8  image  ele¬ 
ments,  or  2  dot  widdis.  This  is  illustrated  in  Figure  9, 

In  this  case,  die  number  of  zero-crossing  points  assigned  ?  dis¬ 
parity  was  11162.  Of  diese  points,  11095  were  assigned  a  disparity 
value  which  was  exactly  correct,  and  an  additional  61  deviated  from 
the  correct  value  by  one  image  element.  Approximately  0.06%  of  the 
points  were  incorrectly  matched.  Again,  dirsc  incorrect  points  all  oc- 
curcd  at  the  boundaries  between  die  planes.  A  second  complex  pattern 
is  illustrated  in  Figure  9.  The  object  is  a  spiral  with  a  range  of  con¬ 
tinuously  varying  disparities, 

There  are  a  number  of  special  cases  of  random  dot  patterns  which 
have  been  used  to  test  various  aspects  of  die  human  visual  system.  The 
algorithm  was  also  tested  on  several  of  diese  stereograms.  They  are 
outlined  below  and  a  comparison  between  the  performance  of  the  algo- 
ritiim,  and  human  perception  is  given.  — r — 

It  is  known  that  if  one  or  both  of  the  images  of  a  random  dot 
stereogram  arc  blurred,  fusion  of  the  stereogram  is  still  possible  (Julcsz 
1971,  p.96).  To  test  the  algoridim  in  this  case,  the  left  half  of  a  50% 
density  pattern  was  blurred  by  convolution  with  a  gaussian  mask.  This 
is  illustrated  in  Figure  10.  The  disparity  values  obtained  in  this  case 
were  not  as  exact  as  in  the  case  of  no  blurring.  Rather,  there  was  a  dis¬ 
tribution  uf  disparities  abuut  die  known  concci  values.  As  a  result,  the 
percentage  of  points  that  might  be  considered  incorrect  (more  than  one 
image  element  deviation  from  the  correct  value)  rose  to  6%.  However, 
the  qualitative  performance  of  the  algorithm  is  still  that  of  two  planes 
separated  in  depth.  It  is  interesting  to  note  diat  slight  distribution  of 
disparity  values  about  those  corresponding  to  the  original  planes  is  con¬ 
sistent  with  the  human  perception  of  a  pair  of  slightly  warped  planes. 

Julesz  and  Miller  (1975)  showed  that  fusion  is  also  possible  in  the 
presence  of  some  types  of  masking  noise.  In  particular,  if  the  spectrum 
of  the  noise  is  disjoint  from  the  spcctaim  of  the  pattern,  it  can  be 
demonstrated  that  fusion  of  the  pattern  is  still  possible.  Within  the 
framework  of  the  Marr-Poggio  theory,  this  is  equivalent  to  stating  that 
if  one  introduces  noise  of  such  a  spectrum  as  to  interfere  with  one  of 
the  Siercv  channels  fusujr  H  still  possible  among  the  other  channels, 
provided  the  noise  docs  not  have  a  substantial  spectral  component  over¬ 
lapping  other  channels  as  well.  This  was  tested  on  the  algorithm  by 
high  pass  filtering  a  second  random  dot  pattern,  to  create  the  noise, 
and  adding  die  noise  to  one  image.  In  die  case  illustrated  in  Figures 
10  and  11,  the  spectrum  of  the  noise  was  designed  to  interfere  maxi¬ 
mally  with  the  smallest  channel.  In  the  case  shown  by  HNOISE1  and 


HN01SR2  in  Table  1.  die  noise  was  added  such  dial  die  maximum  mag¬ 
nitude  of  the  noise  was  equal  to  die  maximum  magnitude  of  die  original 
image.  I  INOISI‘1  illustrates  die  performance  of  the  smallest  channel. 
11NOISH2  illustrates  the  performance  of  the  next  larger  channel.  It  can 
be  seen  that  for  this  ease,  sonic  fusion  is  still  possible  in  die  smallest 
channel,  although  it  is  patchy.  Hie  next  larger  channel  also  obtains  fu¬ 
sion.  In  both  cases,  the  accuracy  of  die  disparity  values  is  reduced  from 
die  no  nal  case.  This  is  to  be  expected,  since  die  introduction  of  noise 
tends  to  displace  the  positions  of  the  zero-crossings.  In  die  case  shown 
by  11NOISH3  and  11N01SF4  in  Tabic  1,  die  noise  was  added  such  that 
die  maximum  magnitude  was  twice  that  of  die  maximum  magnitude  of 
the  original  image.  Here,  matching  in  the  smallest  channel  is  almost 
completely  eliminated  (HN01SF3).  Yet  matching  in  the  next  larger 
channel  is  only  marginally  affected  (11N01SE4). 

The  implementation  was  also  tested  on  the  case  of  adding  low  pass 
filtered  noise  to  a  random  dot  pattern,  with  results  similar  to  that  of 
adding  high  pass  filtered  noise.  Here,  the  larger  channels  arc  unable 
to  obtain  a  good  matching,  while  the  smaller  channels  arc  relatively 
unaffected. 

If  one  of  the  images  of  a  random  dot  pattern  is  compressed  in 
the  horizontal  direction,  the  human  stereo  system  is  still  able  to  achieve 
fusion  (Julesz  1971,  p.213).  'Hie  algoridim  was  tested  on  this  case,  and 
the  results  arc  shown  in  Figure  11.  It  can  be  seen  that  die  program  still 
obtains  a  reasonably  good  match.  The  planes  arc  now  slightly  slanted, 
which  agrees  with  human  perception. 

If  seme  of  the  dots  of  a  pattern  arc  decorrclatcd,  it  is  still  possible 
for  a  human  observer  to  achieve  some  kind  of  fusion  (Julcsz  1971,  p.88). 
Two  different  types  of  decorrelation  were  tested.  In  die  first  type,  in¬ 
creasing  percentages  of  the  dots  in  the  left  image  were  decorrclatcd  at 
random.  In  particular,  die  cases  of  10%,  20%  and  30:/£  wene  tiied,  and 
are  illustrated  in  Figure  12.  For  the  10%  case,  (table  entry  Uncorrl)  it 
can  be  seen  that  the  algorithm  was  still  able  to  obtain  a  good  matching 
of  the  two  planes,  although  the  total  number  of  zero-crossings  assigned 
a  disparity  decreased,  and  the  percentage  of  incorrectly  matched  points 
i  rrwwd  kVIhrvi  IV  fwf  i4gc  wtfl  i-veistfsed  !o 

20%  (tabic  entry  Uncorr2),  the  number  of  matched  points  decreased 
again,  although  the  percentage  of  those  which  were  incorrectly  matched 
remained  about  the  same.  Finally,  when  the  percentage  of  dccorrelatcd 
dots  was  increased  to  30%  (table  entry  Uncorr3),  the  algorithm  found 
virtually  no  section  of  the  image  which  could  be  fused. 

The  failure  of  die  algorithm  to  match  the  30%  dccorrclated  pat¬ 
tern  is  caused  by  the  component  of  the  algoridim  which  checks  that 
each  region  of  the  image  is  within  range  of  correspondence.  Recall  that 
•k  t)*dfcF  *c  le'WWi  Gre  cTsfc  Cf  two  Ki.ngte  beyu.J  rmtffs 

of  fusion  (for  the  current  eye  positions)  which  will  have  only  randomly 
matching  zero-crossings,  and  the  case  of  two  image  within  range  of  fu¬ 
sion,  die  Marr-Poggio  theory  requires  that  the  percentage  of  unmatched 
points  is  less  than  some  threshold.  This  threshold  is  approximately  0.3, 
according  to  die  statistical  analysis  of  Marr  and  Poggio  (1979).  For  the 
case  of  the  pattern  with  30%  decorrelation,  on  the  average,  each  region 
of  the  image  will  have  roughly  30%  of  its  zero-crossings  different  and 
hence  the  algoridim  decides  that  the  region  is  out  uf  range  of  correspon¬ 
dence.  Hence,  no  disparidtes  are  accepted  for  this  region. 

For  the  algorithm,  the  computational  reason  for  the  failure  to 
process  patterns  with  30%  decorrelation  is  that  it  could  not  distinguish 
a  correctly  matched  region  of  such  a  pattern  from  a  region  which  was 


134 


out  of  range  of  correspondence,  but  had  a  random  set  of  matches  for 
many  of  the  points  in  die  region.  It  is  interesting  10  note  that  many 
human  subjects  observe  a  similar  behavior;  that  is.  some  kind  of  fusion 
for  up  to  20%  decorrelation,  although  die  fusion  becomes  increasingly 
weaker,  and  virtually  no  fusion  for  patterns  with  30%  decorrelation. 

One  can  also  decor  relate  the  pattern  by  breaking  up  all  white 
triplets  along  one  set  of  diagonals,  and  all  black  triplets  along  die  odicr 
set  of  diagonals  (Jules/.  1971,  p.87).  The  table  entry  Uncorrd  indicates 
die  matching  statistics  for  this  ease.  Again,  it  can  be  seen  dial  die 
program  still  obtains  a  good  match,  as  do  human  observers.  The  perfor¬ 
mance  of  die  algorithm  is  illustrated  in  Figure  13. 

4.  Statistics 

A  number  of  parameters  arc  important  for  the  theory,  which 
makes  assumptions  about  diem,  and  dicy  have  been  measured  on  ran¬ 
dom  dot  images.  The  worst  cases  occur  for  patterns  with  a  density  of 
50%,  and  for  such  patterns  the  worst  ease  values  encountered  for  die 
parameters  have  die  values  shown  in  Table  2.  The  theoretical  worst  ease 
bounds  used  hy  Marr  and  Poggio  appear  for  comparison. 

5.  Natural  Images 

One  can  test  die  implementation  on  natural  scenes,  as  well  as  ran¬ 
dom  dot  stereograms.  In  this  ease,  it  is  more  difficult  to  assess  die  exact 
performance  of  die  algorithm.  However,  qualitative  performance  can  be 
measured  and  some  examples  arc  shown  in  Figure  14.  In  these  eases, 
as  well  as  other  natural  images  oil  which  die  algorithm  lias  been  tested, 
die  disparity  values  obtained  by  the  algorithm  arc  seen  to  be  in  good 
qualitative  agreement  with  die  siiapcs  of  die  surfaces  in  die  scenes. 


mo\cmcnts.  Nature  280,  829-832. 

Knight.  T.F.,  Moon,  D.A..  Holloway,  J.,  and  Steele,  G.L.  1979 
CADR  MIT  Artificial  Intelligence  Laboratory*  Memo  528. 

Marr,  D.  and  Hildreth,  E.  1980  'llieory  of  edge  detection.  Proc.  R. 
Soc.  Loud,  (in  the  press). 

Marr,  D.  and  Nishihara,  H.K.  1978  Representation  and  recogni- 
uon  of  die  spatial  organization  of  thrcc-dimcnsionai  shapes.  Proc.  R. 
Soc  LotuL  B.  200,  269-294. 

Marr,  D.  and  Poggio,  T.  1976  Cooperative  computation  of  stereo 
disparity.  Science,  N.  Y.  194, 283-287. 

Marr,  D.  and  Poggio,  T.  1979  A  computational  dicory  of  human 
stereo  vision.  Proc.  R.  Soc.  Land.  B.  204,  301-328. 

Marr,  D.,  Poggio,  T.  and  Hildreth,  E.  1979  The  smallest  channel  in 
early  human  vision.  JOS  A  (submitted  for  publication). 

Mayhew.  J.F.VV.  and  PTisby,  J.P.  1976  Rivalrous  texture  stereograms 
Nature,  Land  264,  53-56. 

O'Brien,  U.  1951  Vision  and  resolution  in  the  central  retina.  J.  Opt. 
Soc.  Am.  41,  S82-894. 

U 11  man,  S.  1979  The  interpretation  of  visual  motion  Cambridge: 
MIT  Press. 

Wilson,  H.R.  and  Bergen,  J.R.  1979  A  four  mechanism  model  for 
spatial  vision.  Vision  Res.  (in  the  press). 

Wilson,  H.R.  and  Giese,  S.C.  1977  Threshold  visibility  of  fre¬ 
quency  gradient  patterns.  Vision  Res.  17. 1177-1190. 


6.  Acknowledgements 

Without  David  Marr  and  Tomaso  Poggio,  this  work  would  have 

Uim'4.  suahlr  FSkrt  Hiidrrfh  Keith  Nishihara  and  Shimon  Ullman 
provided  many  useful  comments  and  suggestions. 

7.  References 

Brnddick,  O.  1978  Multiple  matching  in  stcrcopsis.  (unpublished 
MIT  report). 

Campbell,  F,W.  and  Robson,  J.  1968  Application  of  Fourier  analysis 
to  the  visibility  of  gratings,  J,  Physiol ,  Lund.  197,  551-566. 

Grimson,  W.E.L,  A  refinement  of  a  computational  theory  of  human 
stereo  vision  in  preparation. 

Grimson,  W.E.L.  and  Marr,  D.  1979  A  computer  implementation 
of  a  theory  of  human  stereo  vision.  Proceedings:  Image  Understanding 
Workshop  41-47 . 

Julcsz,  B,  1960  Binocular  depth  perception  of  computer-genera  ted 
patterns.  Bell  System  Tech.  J.  39, 1125-1162. 

Julcsz,  B.  1971  Foundations  of  cyclopeun  perception.  Cnicago:  The 
University  of  Chicago  Press, 

Julcsz,  B.  and  Miller,  J.E.  1975  Independent  spatial-frcqucncy- 
tuned  channels  in  binocular  fusion  and  rivalry.  Perception  4 125-143. 

Kidd,  A.L.,  Frisby,  J.P.  and  Mayhew,.  J.E. W.  1979  Texture  con¬ 
tours  can  facilitate  stcrcopsis  by  initiating  appropriate  vcrgence  eye 


135 


1  \IH.1: 01  MATCUKS 

pattern 

density 

total 

exact 

one  pixel 

w  rung 

%wrong 

square 

50% 

II 347 

1 1830 

14 

3 

.03 

square 

25% 

9661 

r  %32 

22 

7 

.07 

square 

10% 

5236 

5264 

20 

2 

.04 

square 

5% 

3500 

3498 

o'” 

2 

.06 

wedding 

50% 

1 1 162 

11095 

61 

6 

.06 

luioiscl 

50% 

h  2270 

1909 

346 

15 

.7 

hnoisc2 

50% 

8683 

6621 

1868 

194 

.  2. 

huoisc3 

50% 

63 

28 

h^-2— - 

11 

[  I?.  ; 

hnolse4 

50%' 

8543 

5194 

2864 

485 

6. 

uncorrl 

50% 

9545 

9091 

*— > 

OJ 

191 

i 

_jincorr2 

c5 

o 

1 

4343 

4120 

143 

80 

2. 

uncorr3 

50% 

134 

127 

2 

5 

4. 

uncorrd 

50% 

6753 

6325  ~ 

271 

~  157 

—27-; 

Table  h 


tabus  of  statistics  1 

parameter 

|  expected  worst 
ease  behavior 

large  channel 

w  =  35 

J  medium  channel 
w  =  17 

small  channel 
w  =  9  1 

average  distance 
between  zero-crossings 
of  same  sign 

2  w 

1.51  w 

1.88  w 

1 

1.87  w  | 

probability  of 
candidates  in  at 

most  one  pool 

>.50 

.77 

.75 

.69 

probability  of 

candidates  In 

two  pools 

<.45 

.21 

.25 

.31 

1 

probability  of 
|  candidates  in  all 
three  pools 

<.05 

.02 

.01 

.01 

given  n  candidate 

near  zero, 

probability  of  no  j 

other  candidates 

>.9 

.88 

- 

.85 

.87 

Table  2. 


136 


Convolve  Zero-  Match  Combine 

Crossing  Channels 

i 

Figure  1.  Diagram  of  the  algorithm.  The  images  of  the  scene  arc  mapped  into  the  images  of  the  retinas, 
hiking  into  account  the  eye  positions.  Iiach  image  is  convolved  with  a  set  of  different  sized  masks  and  zero* 
crossings  arc  located  for  cacii  convolution.  For  each  size  mask,  the  left  and  right  zero-crossing  descriptions  are 
matched.  These  arc  combined  into  a  single  representation.  As  well,  the  matches  from  the  larger  channels  can 
drive  eye  vcrgencc  movements,  causing  new  retinal  images  to  be  created  and  allowing  the  smaller  channels  to 
come  into  correspondence. 


138 


Figure  3.  Hxainples  of  convolutions  with  The  top  figure  shows  a  natural  image.  The  bottom 
figures  show  the  convolution  of  this  image  with  a  set  of  V26*  operators.  The  sizes  of  these  operators  are 
w  =  36, 18,  9  and  4  imap.e  elements. 


h 

I 


mm 

»g 


Figure  4.  Examples  of  convolutions  with  V2G.  Hie  top  figure  shows  a  random  dot  pattern.  The  bottom 
figures  show  die  convolution  of  this  image  with  a  set  of  V2(7  operators.  The  sizes  of  these  operators  are 
w  —  3 G,  18, 9  and  4  image  elements. 


140 


fin,  rnr-i8r  5',  E™mpks  °? descriptions.  The  top  figure  show  a  natural  image.  The  bottom 
figure,  show  the  zero-crossings  obtained  from  the  convolutions  of  Figure  3.  The  white  lines  mark  positi-c 
zero-crossings  and  the  black  lines,  negative  ones.  P  ' 


141 


Figure  6.  Examples  of  zero-crossing  descriptions.  The  top  figure  show  a  random  dot  pattern.  The  bottom 
figures  show  die  zer ^-crossings  obtained  from  the  convolutions  of  Figure  4.  Hie  white  lines  mark  positive 
zero-crossings  and  the  black  lines,  negative  ones. 


■  j 


mm 


■•ft' 


i-igurc  7.  Results  of  the  algorithm.  The  top  stereo  pair  is  an  image  of  a  painted  cofTec  iar  T 

vrsr.r  r  o'““srai,"k'  ””  »r <**■*»  •«.  n«  ^ 

HpL-  |d  £  dispar,ty  II,ap-  ,hls  aSrccs  with  the  fused  perception.  The  second  stereo  pair  is  a  50< 


Figure  8.  The  lop  stereo  pair  is  ;i  25%  density  random  dot  pattern.  The  disparity  map  bcfow  it  is 
displayed  as  in  Figure  7.  Hie  bottom  stereo  pair  is  a  5%  density  random  dot  pattern.  Its  disparity  map  is 
shown  below  it.  Both  disparity  maps  are  obtained  from  die  w  =  4  channel. 


:■  j  i x ^ -y y-'  -  - 1 

/  f.;v T- 1  -7'J  >;■’  i.  *■ 

■:.j '  '■  '7  -  ■( 

■i  i  i.  h,!  ii  i  -T"-  ■  .  r. .  ,  .  ■  .  k 


«6®e 


WS1S6W55S 


Figure  9.  Hie  top  stereo  pair  is  a  50%  density  wedding  cake,  composed  of  four  planar  levels.  The 
disparity  map  is  shown  below  it.  The  bottom  stereo  pair  is  a  50%  spiral.  The  disparity  map  is  shown  below  it, 
in  a  manner  similar  to  Figure  7.  Both  disparity  maps  are  obtained  from  die  w  =  4  channel. 


V  ■  _  p 

mm 


*%fr  4  **  * 


. 'T*. 

rv  ;>* 


V  >■  J?*.  ■ 

’■CV:*  > 

■  f*  1  '  >  C  ' 


Figure  10.  The  top  stereo  pair  is  a  50%  density  pattern  in  which  'lie  left  image  has  been  blurred.  The 
disparity  map  is  shown  below  it.  It  can  be  seen  that  two  planes  arc  still  evident,  although  they  arc  not  as 
sharply  defined  as  in  Figure  7  or  Figure  8.  1  he  disparity  map  is  dial  obtained  from  die  w  =  4  channel.  'Hie 
bottom  stereo  pair  is  a  50%  density  pattern.  The  left  image  has  had  high  pass  filtered  noise  added  to  it  so  that 
die  maximum  magnitude  of  die  noise  is  equal  to  die  maximum  magnitude  of  the  image.  The  disparity  map 
shown  is  dial  obtained  by  die  w  =  9  channel. 


Mgrne  II.  Iho  lop  stereo  pair  is  a  50%  density  patten 
added  to  it  so  that  the  maximum  magnitude  of  die  noise 
disparity  map  is  that  obtained  from  die  to  =  9  channel 
w  =  4  channel.  It  can  lie  seen  that  die  w  =  4  channel  o 
1  he  bottom  stereo  pair  is  a  50%  density  pattern  in  which 
direction.  Ihe  disparity  map  from  the  w  =  4  is  display 
evident,  aldiough  die  entire  pattern  appears  slanted.  This 


m 

m 

-i  if 

ifii 

mm 

mm® 

147 


Figure  12.  The  top  stereo  pair  is  a  50%  density  pattern  in  which  the  left  image  has  had  10%  of  the  dots 
dccorrelrted,  The  disparity  map  is  shown  below.  The  bottom  stereo  pair  is  a  50%  density  pattern  in  which 
die  left  image  lias  had  20%  of  the  dots  decor  related.  The  disparity  map  is  shown  below.  Note  that  in  this  case 
there  are  large  regions  of  the  image  for  which  no  match  was  made. 


-L 


I 


>pryr 

JTRHS 

frtiz-.ErtSM 

TBJ 

sags 

3$ 

JjY 

SS^rM 

/  LI  jr_^j  1 

■aKfrf-1 

j-r  »TT  ■ 

fTilTiJJ 

£S£iPr.i 

cWp  f Jjl 

.  il  jlwi 

r^irs 

SLUftl 

ffr^S'T 

1 1  JrpjjK 

lyT^4 

J.^ 

iB5'Ti  ?yjpf 

r-j?H3r*5 

diK/inSF 

'-■tS 

[A 

Vr-KrTJr 

“irUwrrfT' 

JCjjjj  Jjt* 

r 

%  V 

i  r  *v 'iT' 

a  t=l  rfj  r 

JSKmL 

M  1  jmJPmJ 

-"JrSvi 

jiX' 

Jr^  J1*  fed- 

1  ■ 

~p 

, 

{ 

l  H  1 

■f 

I 

* 

l 

1  | 
I 

■ 

i  . 

'  v  1 

• 

,1 

1;. 

1 

L. 

1,  *  * 

■■ 

1 _ _ _ 

150 


EXPERIENCE  WITH  THE 
GENERALIZED  HOUGH  TRANSFORM 


K.  R.  Sloan,  Jr. 
D.  H.  Ballard 


The  University  of  Rochester 
Rochester,  New  York  14-627 


Ah  stract 

The  Hough  Transform  is  a  method  for  Jjt®J^n*u^eBIheyin"iil1S?k  shSwed^ol'to 

between  points  on  a  curve iqgo.  Duaa  and  Hart,  10721  and  non-analytic  curves 
detect  both  analytic  curves  I  Hough,  1  *  hinarv  edge  images.  This  work  was 

fHrs^L^ro^ran^cbes?'  1  ^it^cfrcles^LKUe  'af  alT?'  ’andlaSSal 

[Wechsler  and  Sklansky,  1 975  ]  - 

Recently,  the  Hough  technique  9yq]  .^Thi^shap^detection  scheme 

non-analytic  shapes  in  grey  level  imag  L  ’  .  tificial  images  and  has  found 

has  been  implemented  and  tested  on  a  van  7  Experience  to  date  indicates  that 

the^te'chnique^  is  “ ocflusioL ,* 'but*  requires  reliable  edge-element 

orientation  determination. 


1 .  Introduction 

Shape  is  an  important  attribute  of 
two-dimensional  figures.  In  simple 
figure-ground  binary  images,  the  ~RaPe  oi 
the  boundary  of  the  figure  is  often  the 
only  interesting  feature.  We  take  shape 
to  be  a  property  of  the  entire  figure, 
i.e.,  it  is  a  global  property. 


Evidence  about  the  shape  of  a  figure 
is  found  at  the  boundary  between  figure 
and  ground.  Such  evidence  can  be 
generated  by  the  application  of  local 
edge-element  detectors.  An  edge-element 
detector  typically  reports  on  the  presence 
of  an  edge-element  in  a  small  window  of  an 
image,  and  on  the  orientation  of  that 
edge-element.  Finding  shapes  in  the  image 
involves  combining  many  pieces  of  local 
evidence  into  a  global  judgment. 

The  Hough  Transform  is  a  method  for 
detecting  curves  by  exploiting  the  duality 
betwqeen  points  on  a  curve  and  parameters 
of  that  curve.  The  initial  work  showed 
how  to  detect  both  analytic  curves  IHough, 
1962;  Duda  and  '  Hart,  1972]  and 
non-analytic  curves  [Merlin  and  Farber, 
1975],  in  the  case  fciRary  edSe  images. 
This  work  was  generalized  to  the  detection 
of  some  analytic  curves  in  grey  level 
images,  specifically  lines  [0 'Gorman  and 
Clowes,  1 977] ,  circles  [Kirrne  et  al . , 
1975],  and  parabolas  [^ecllsler  and 

Sklansky,  1  975']  • 


Recently,  the  Hough  technique  has 
been  extended  to  the  detection  0 
arbitrary  non-analytic  shapes  in  grey 
level  images  [Ballard,  1 97P ].  Given  an 
arbitrary  shape,  S,  this  generalized  Hough 
technique  provides  a  mapping  rom  the 
orientation  of  an  edge-element  to  the  set 
of  instances  of  S  ( as  modified  y 
location,  rotation,  and  uniform  scaling) 
which  could  have  given  rise  to  that 
edge-element.  This  mapping  allows  all 
local  evidence  for  a  particular  instance 
of  S  to  contribute  to  global  decisions 
about  the  figure. 

This  shape  detection  scheme  has  been 
implemented  and  tested  on  a  variety  of 
artificial  images  and  has 
application  in  the  analysis  of  real  aerial 
images.  Experience  to  date  indicates  tha 
the  technique  is  robust.  Also,  with 
appropriate  "focus  of  attention 

mechanisms,  which  are  present  in  ou 
implementation,  the  method  is  also 
efficient.  However,  the  reliable 

determination  of  edge-element  orientation 

is  crucial  to  the  success  of  this  method. 


151 


2  -  Hough  Techniques 


■All 

d  etection 
elements : 
a) 


Hough  techniques  for  shape 
consist  of  the  folio wing  basic 


edge- element  detector, 


a  local 

E, 

an  n-d imensi onal  parameter 
space,  P,  quantized  and 
represented  by  an  n-dimensional 
Accumulator  Array,  AA, 

c)  a  mapping,  M,  from  the 
information  provided  by  E  into  P 
(and  thus  AA)  , 

d)  a  voting  rule,  V,,  specifying  how 
a  particular  edge-element 
affects  the  values  of  AA, 

e)  a  Detection  rule,  D,  specifying 
the  conditions  under  which  a 
particular  shape  has  been 
detected . 

Given  these  basic  elements,  shapes 
are  found  by  the  following  procedure: 


a) 

b) 

c) 


d) 


zero  AA, 

apply  E  everywhere  in  the  image, 
for  each  edge-element  found, 
apply  M  to  locate  cells  in  AA. 
Then  apply  V  to  modify  the 
contents  of  these  cells.  (i.ef. 
vote  for  all  possible 
of  this  edge-element), 
finally,  apply  D  to  AA 
the  most  popular  shape) . 


"  causes" 
( choose 


Clearly,  application  of  this 
technique  depends  on  the  ability  to 
parameterize  the  shapes  of  interest,  and 
the  derivation  of  the  mapping  M  from 
edge-element  information  to  possible  shape 
parameters . 

Lines 

The  original  Hough  transform 
capitalized  on  the  observation  that 
straight  lines  can  be  completely  specified 
by  two  parameters  (e.g.,  an  orientation 
9  ,  and  a  distance  from  the  origin,  p  ). 
What  is  more,  the  mapping,  from  a 
particular  edge-element  position  to  the 
set  of  straight  lines  it  might  be  a  part 
of,  is  easy  to  compute  [Hough,  1962;  Duda 
and  Hart,  1972].  The  idea  is  that  an 
actual  line  in  the  image  will  give  rise  to 
many  local  edge-elements,  all  of  which 
vote*  for  that  line.  Individual 
edge-elements  will  also  vote  for  other 
lines,  but  the  "correct"  line  will  receive 
the  most  votes . 

If  the  edge-element  operator,  E, 
provides  directional  information,  then 
each  edge-element  maps  to  a  unique  line. 
Edge  elements  which  line  up  vote  for 
"their"  line,  and  the  line  with  the  most 
visible  edge-elements  gets  the  most  votes. 

ttat  is  no t  necessary  for  the 
edge-elements  to  be  connected  tor  even  be 


near  each  other^  in  order  that  their  votes 
reinforce  one  another  -  they  must  simply 
be  colinear. 

Circles 

The  description  of  circular  figures 
in  an  image  requires  three  parameters:  x, 
y,  p  .  ^he  location  of  the  center  of  the 
circle  is  given  by  <x,y>  and  the  radius  is 
given  by  the  scale  parameter,  p  .  Once 
again,  each  edge-element  in  the  image  is 
evidence  for  a  set  o^  <x,y,  p  >  triples. 

If  the  direction  of  the  edg< -element 
is  unknown,  then  the  locus  of  points  in 
parameter  space  representing  circles  which 
could  have  created  this  edge-element  forms 
a  right  circular  cone.  .  In  the  presence  of 
direction  information,  this  locus  is 
reduced  to  a  line  |  Ballard,  1 Q7q  1 .  As 
with  line  detection,  circles  which 
actually  appear  in  the  image  will  receive 
many  votes;  those  which  do  ‘not  will 
receive  few  votes. 

Arbitrary  Shapes 

The  Hough  technique  can  be  extended 
to  analytic  shapes  for  which  the  mapping 
from  edge-element  to  a  locus  of  points  in 
parameter  space  can  be  derived.  Given 
certain  assumptions  about  the  meaning  of 
"shape",  we  can  also  extend  the  technique 
to  arbitrary,  non-analytic  shapes. 

Consider  a  particular  figure  (e.g., 
an  ellipse  centered  at  <1,2>  with  its 
major  axis  parallel  to  the  x-axis  and  of 
length  10,  and  its  minor  axis  of  length 
5).  Now,  consider  the  set  of  figures 
which  can  be  produced  by  t  x’anslat  ing , 
rotating,  and  uniformly  scaling  the 
original  figure.  For  our  purposes,  all  of 
these  figures  have  the  same  shape. 

The  parameter  space  which  captures 
this  notion  of  shape  is: 

.  p  =  <x-y.  p  »  e  > 

where 

<x,y>  is  the  origin  of  a  local 
co-ordinate  system 

P  is  a  scale  factor 

9  is  a  rotation  about  <x,y>. 

This  is  the  parameter  space  used  in  our 
generalized  Hough  Transform.  Note  that 
the  Hough-spaces  developed,  above  for  lines 
and  circles  are  sub-spaces  of  P. 

The  key  to  all  Hough  techniques  is 
the  mapping  from  edge-element  information 
to  a  locus  of  points  in  P.  We  assume  an 
edge-element  operator  which  provides 
directional  information.  As  seen  above, 
this  directional  information  can 
drastically  reduce  the  image  of  the 


/ 


(,  o 


.v:. '  / 


152 


-.the  rSuMff?1"*! 


edge-element  xn  p 
depends  strongly  on  th 

edge-element  flfecJioS:  of  the 

.  •CT‘<%  thf  of  ,  sith 

mapping  from  edge-elemen-  r1PDre°ent  the 
orientation  to  figure  ^w?--  location  and 
an  "R-Table"  (afe  %!*»?  ",  dire?ny  ln 

orientation  of  an  e^ge-fllme  +  •  2)  ‘  The 

an  index  into  this  t-^hle  m®nt  ls  used  as 
a  set  of  <x,y>  vectors1  ’whoel'eJ?re  stored 
<x>y>  location  of  the'edy  en  added  to  the 
image,  these  vectors  noirT  ^  the 

locations  for  the  P°lr*t  possible 

local  eo-„rd°nrate“es  of  »  -yurras 

point).  This  man  io  ‘ lts  reference 
an  original  master  shapel  t0  build’  ?iven 

-he  expansion  of  the  R  Tun 
to  cover  the  remainder  of  maPPing 

dynamically  by  our  voting  ?  Performed 
Tms  involves  r nt=  t , i  ng  Pr°oedure  ,,  v. 
orientation  before  usinl  n?he  edSe-element 
into  the  R-Table  and  fe  ^  aS  an  lndex 
entries  thus  formed  befnrellnf  the  ^able 
igure  hypothesized  reference^oi^f  the 


lis  of  referen-e  ^no  t’  oontnining  n 
scheme  which  associate*  Vep^ors-  Anv 
orientation  with  referent  ^"-element 
will  do.  erence  point  vectors 


ydge  Detecti 


on 


and 


Experimental 


simple  ^xf^be!  ed^°W?  helow  «sed  a 
general ,  this is  saU*^6"*  fi"d«R-  Tn 
m  one  example  bei0v  °tT-  Vhen-  a* 

Provide  reliable  edse'elem  11S  d°Ra  not 
Performance  deterioStSHSfo^1”1^^". 

detection  criteria 

the  shanehefound°bvSthI  ,thene  examnles, 

Transform  is  s0+-  be  generalized  Rough 
selecting  the  maxim,’ T™*,  by  simnlj 
smoothed  (over  a  ,vall,e  ^oiind  in  a 
Accumulator  Array.  This  window) 

thing  when,  as  in' most  If  °es  the  right 
the  maxima  in  the  a  our  examples, 

sharp  peaks.  For  more  n?m?iat°r  Arra-V  are 
situations,  clustering problematic,  noisy 
may  be  required.  ^  n  Parameter  space 


3e,  generalized  Hough 

nbed  above  v  _  DU«n 


described  above  has  h  Hough  Transform 

and  ted  a  variety  of  I?ti?ie?Tted  and 
and  nas  found  applica+-j  nn  ‘  1?^al  images 
of  real  aerial  images  S'"  ■  the  analms 
indicates  that  the  tJhn Srience  to  date 
given  that  the  edge^elemel  ^  18  robust, 
to  generate  local  PJ!  *  operator  used 
can  provide  reliable  for. the  shape 

edge-element  direction  nformation  about 


from 
and 
of 


R-Tables 

edge-elementablinf1X"?S  the  mapping 
orientation)  f™810"  (position 
parameter  space.  Thi?  hyPerPlane  ux 
from  an  explicit  master  ”uPping  ls  derived 
2f  .  a  sequence  r  Sbape’  in  the  form 
Typically,  we  sketch  for  S"d?ry  points- 
In  order  to  ease  ihl  &Ce)  a  shape. 
drawing  a  particular  oJ?Pain  °f  oarefully 
sample  the  master  shane'hWe  °ustomarily 
coarsely  and  then  fill"?? e,boundary  rather 
Points  using  a  R  ^  the  intermediate 

to  th  tit 

reference  point  i  Q  ^  ^  arbitrary 

of  the  local  oo-ordl„ath.°%nstf°;.the 

shape  boundary)  on  the  master 

orientation  of  the  .  calculate  the 

•t  that  poiJt  Ld  tUheary  f^-olemelt 
boundary  edge-element  t o  thl  °r  from  the 

local  co-ordinate  system  mVrigin  of  the 

an  R-Table  entry  Tbls  13  exactly 
implementation  of  theR  Tav,?be  current 

*  llSt  0f  e"‘"ts».  taggeda^with°nth)^E  °f 


Artificial  Images 


the  '‘fraturra^  Vl»str,tes  a  fee 

"Si?™,4?"™  &,*• 

provide  controlled  cond  i  t :  lcia:l  images 
testing.  in  Figure  xa  "nb1?nS  for  our 
aa  expected,  the  method  hf  b  We  seR  that, 
ln  finding  the  centra?  n°  difficulty 

scale  and  orientation  (mheavf„ab  arhitrary 
the  shape,  as  dra°nn  i.  b3  bPa^  nots  show 
tue  parameter  choices  wh?nv,the  R_niahle  qnd 
most  votes  in  the  i.  1C?  received  the 
central  black  Cotls  them1?t0r  ArraV,  the 
In  Figure  we  see  what  P°iRt.) 

same  shape,  obscured  bf  another  t0  y0  the 
d  demonstrates  that  th^l  •'  T^i^ure 
evidence  for  the^lesi^6 


correctly  determine 
rientation,  and  scale. 


to 


•  ,  —MV. 

1ZS  location, 

Api  of  these  images  n-r 
very  clean  edges  and  lC0U7rse,  have 
operator  has  no  difficuitv  he  ^ohel 

determining  edge-el^ent  +  2°rrectl'V 
way  of  contrast,  see  Figur*  *entatlon’  By 
image,  which  has  been  g  ,  Tn  this 
addition  of  considelbll5  6d  by  the 
edge-elements  found  bl  ^  n°'Se- 
operator  are  too  5y  the 

hopelessly  jumbl^  %  a°d  the 

Hough  Transform  (whi?h  1’  1  generalized 
the  accuracy  of  edge  P,  epeads  strongly  on 
is  unable  to  iocaU~tlTT  0ri®ntation 
f  own  is  not  much ^more  th.Tl,  *he  guesa 
Accumulator  Array  harnnth  that  “  the 
aRd  we  Simpiy  sho?  ?hpVe?  StronA'  Ppak, 

whicn  received  the  m£st  ?otes  i8n3tance 

vuoea  in  a  vory 


the 
^obe  1 
noise 


*J\  W%r*. .  \ 


sgjf  ' 


153 


close  election. 

Aerial  Photographs 

^he  location  of  arbitrary, 
non -analytic  shapes  is  not  merely  of 
interest  in  artificial  images  such  as  that 
shown  above.  The  original  version  o^  the 
shape  found  above  came  from  the  aerial 
image  shown  as  Figure  7f.  Even  the 
experimental  version  of  the  generalised 
lough  Transform  has  no  difficulty  in 
locating  the  pond  in  this  image. 

1 :  Focus  of  attention 

One  uf  the  difficulties  encountered 
m  the  application  of  this  technique  to 
real  images,  for  the  location  of  real 
shapes,  is  that  the  area  searched  for 
evidence  of  boundaries  (the  application  pf 
the  edge-element  detector  and  the  mapping 
from  edge-element  information  to  parameter 
space)  and  the  size  of  the  parameter  space 
can  quickly  become  very  large.  The 
solution  to  these  problems  is,  of  course, 
to  attempt  to  focus  attention  wher^ 
possible . 

Where  to  Look 

One  way  to  focus  attention  during  the 
application  of  this  shape-finding 
technique  is  to  constrain  the  a-ea 

searched  for  evidence  of  the  figure's 
boundary  [Russell  ahd  Brown,  1  Q7R* 
Russeil  1  979  ]  •  In  a  system  which 
ou finely  appxies  an  ed|ge  operator  over 
-he  entire  image,  this  may  not  seem  to  be 
a  solution  (or  even  a  problem).  However 
even  after  the  edge-elements  have  been 
iound,  it  is  still  necessary  to  apply  the 
mapping  to  parameter  space  (one  per 
desired  shape).  Our  implementation 
includes  the  usual  "bounding  rectangle" 
limitation  on  the  area  in  which 
edge-elements  are  to  be  found  and  mapped 
to  parameter  space.  This  improves 
performance  significantly. 

What  tc  look  For 

The  second  obvious  way  to  focus 
attention  is  to  constrain  the  objects 
being  sought.  Of  course,  a  single 
application  of  the  generalized  Hough 
Transform  concentrates  on  the  location  of 
a  particular  class  of  shape  (that  defined 
by  the  R-Table) .  In  addition,  it  is 
usually  possible  to  constrain  the 
permissible  values  for  some  (if  not  all) 
of  the  parameters.  Constraining  the 
location  of  the  reference  point  is  related 
to  the  question  of  "Where  to  look". 
Constraining  the  parameters  of  scale  or 
rotation  is  also  possible,  and  certainly 
worth  doing.  Sometimes,  the  unconstrained 
search  ior  a  particular  shape  (such  as  the 
pond  in  Figure  ?f)  will  result  in  almost 


complete  inform 
values  to  be 
searches.  For 
been  located 
including  locnt 
and  scale)  mnp-1 
particular  part 
the  search  for  o 
to  be  almost  com 
oui  technique 
benefit  from  sue) 


la oion  about  the  r^nge  of 
considered  in  successive 
example,  once  the  pond  has 
(in  parameter  space, 
ion  in  the  image,  rotation 
ike  knowledge  about  this 
or  the  world  would  allow 
'ther  shapes  in  the  scene 
pletel.v  determined,  mv-ms  , 
can  both  generate  and 
)  constraints. 


Although  our  current  impl 
uses  only  a  simple  "bounding 
constraint  on  the  area  of  the  im 
searched  for  boundary  informat 
possible  to  combine  information 
range  of  locations  for  the 
point,  scale,  and  rotation.  Vhe 
these  are  sufficiently  const ra 
the  R-^ahle  itself  provides  po 
the  locations  to  he  searched  in 
for  edge-elements. 


em^nta t ion 
rectangle” 
nge  to  be 
ion,  j. t  is 
about  the 
reference 
n  all  of 
ined,  then 
inters  to 
the  imn<*e 


5  Conclus i on 

Fhape  is  an  impo 
feature  o^  many  image  oh 
only  useful  feature.  mhe 
the  Hough  Transform  have 
produce  a  shape  detection 
performs  well  in  the 
exclusion,  even  for  oomple 
non-analytic  shapes . 
demonstrated,  however, 
depends  strongly  on 
estimation  of  edge-element 


>rtant  defining 
>:’ects,  often  the 
key  ideas  behind 
been  extended  to 
technique  which 
presence  ep 
tely  arbitrary, 
As  has  been 
the  technique 
the  reliable 
orientation . 


REFERENCES 


Ballard 


trd  ,  D.H.  and  Fklansky,  j,  A 
ladder-structured  decision  tree  for 
recognizing  tumors  in  chest 
radiographs,  IEEE  transactions  on 
Computers,  V01  C-25,  i 07g  ™ 

503-Bn.  ’  T 


Ballard,  D.H. ,  Generalizing  the  Hough 
Transform  to  Detect  Arbitrary  Fhaoes, 
tRGG,  Computer  Science  Department, 
University  of  Rochester,  October, 
^  (a) j  submitted  to  Pattern 

Recognition .  ~ - 

Duda,  R.O.  and  Hart,  P.E. ,  Use  of  the 
Hough  transform  to  detect  lines  and 
curves  in  pictures,  CACM  18,  1,  jan 
1qr72,  pp.  11-15. 

Hough  P.V.C.,  Method  and  means  for 
.•cognizing  complex  patterns,  U.H. 
;ent  ^,068,654,  1°62. 

kimme,  C.,  Ballard,  D.H.,  and  Hklansky, 
J- ,  Finding  circles  by  an  array  of 
accumulators,  CAOM  18,  1,  Feb.  107R 
PP-  120-122. 


154 


Merlin,  P.M.  and  Farber,  D.J.,  A  parallel 
mechanism  for  detecting  curves  in 
pictures,  IEEE  Transactions  on 
Computers,  Vol  C-24,  Jan.  1Qr7^,  pp . 
96-9B. 

Gorman,  F.,  and  Clowes,  M.B.,  Finding 
picture  edges  through  coilinearity  of 
feature  points,’  Proc.  Third  IJCAI, 
197^,  pp.  54'a5-5?5 

Piesenfeld,  R.,  Application  o^  B-Fpline 
approximation  to  geometric  problems 
of  Computer-Aided-Design . , 
UTEC-CSs-73-1 26,  University  of  Utah, 
March,  1Q7*. 

Russell,  D.M.  ,  Where  do  I  look  now9: 
Modeling  and  inferring  object 
locations  by  constraints,  Proc.  IEEE 
Conf .  on  Pattern  Recognition  and 
Image  Processing,  Chicago,  Illinois, 
August,  IP’70. 

Russell,  D.M.  and  C.M.  Brown, 

Representing  and  using  locational 
constraints  in  aerial  imagery,  Image 
Understanding  V/orkshon,  November, 
1  978 . 

Sklansky,  J. ,  On  the  Hough  technique  for 
curve  detection,  IEEE  Transactions  on 
Computers,  July,  1977* 

Shapiro,  S.D.,  Transformation  for  the 
computer  detection  of  curves  in  noisy 
pictures,  Computer  Graphics  and  Image 
Processing,  4,  pp . '228-'5I8,  1075. 

Shapiro,  S.D.,  Properties  of  transforms 
for  the  detection  of  curves  in  noisy 
pictures,  Computer  Graphics  and  Image 
Processing,  8,  pp .  21°-2^6,  1°78. 


Wechsler,  H.  and  Sklansky,  J.,  Automatic 

Hotonti  ftn  nf  ri  h<5  in  nliPQt 


k55 


THE  GAUSSIAN  SPHERE: 

A  UNIFYING  REPRESENTATION  OF  SURFACE  ORIENTATION 


John  R.  Kender 


Department  of  Computer  Science 
Carnegie-Medlon  University,  Pittsburgh,  PA  15213 


ABSTRACT 

This  is  an  informal,  qualitative  presentation  of  come 
recent  results  concerning  the  representation  of  surface 
orientation.  We  show  that  flip  exrling  methods  using  the 
gradient  space  are  inadequate,  in  several  ways.  Using  the 
Gaussian  sphere  instead  greatly  simplifies  the  mathematics 
of  many  surface  orientation  problems.  One  intuitively 
satisfying  and  pedngogically  transparent  result  is  that  many 
existing  rela'ions  map  into  latitude  lines  or  longitude  lines 
on  an  appropriately  oriented  Gaussian  sphere  This  use  also 
leads  to  several  predictions  concerning  what  surface 
orientation  properties  arc  the  easiest  to  calculate;  they  also 
seem  to  be  the  moot  important.  Some  problems  still  remain, 
however.  They  may  be  better  solvable  through  the 
application  of  spherical  geometry. 


INTRODUCTION 

The  representation  and  manipulation  of  surface 
Orientation  is  a  critical  part  of  image  understanding.  Any 
domain  that  ha;  a  non-negligible  depth  component  requires 
some  method  of  dealing  with  changes  in  depth,  both  sudden 
and  smooth.  Smooth  depth  changes  give  rise  to  surfaces- 
surface  orientations  (local  depth  changes)  have  two  degrees 
of  freedom.  That  is,  surfaces  can  tilt  top-to-botlom,  and 
loft-to-right. 

On  one  hand,  representing  these  degrees  of  freedom 
is  not  a  problem.  As  long  as  smooth  changes  in  orientation 
are  smooth  in  the  representation,  and  as  long  as  the  space 
is  complete,  one  space  is  mathematically  equivalent  to  any 
other.  On  the  other  hand,  spaces  are  not  all  mathematically 
simple,  nor  are  they  representationally  concise.  We  will 
show  that  the  gradient  space,  currently  the  preferred 
representation,  is  not  as  elegant  nor  manipulate  as  the 
Gaussian  sphere  parameterization.  We  will  also  show,  by 
using  the  spherical  representation,  that  many  existing 
results  concerning  surface  orientation  have  the  same 
particularly  simple  form. 

In  this  paper,  many  of  the  mathematical  details  are 
suppressed  for  the  sake  of  presentation.  However,  they 
appear  in  full  detail  in  the  author’s  thesis  [Kender,  1980]. 
The  stress  here  is  on  qualitative  aspects  of  the  two 
representation  spaces,  this  is  also  reflected  in  the 
qualitativeness  of  the  figures. 


INADEQUACIES  OF  THE  GRADIENT  SPACE 

The  gradient  space  represents  surface  orientations  in 


terms  of  the  rale  of  change  of  surface  depth  with  respect  to 
two  orthogonal  image  axes.  Bas.cally,  the  representation 
uses  the  gradient  resolved  into  its  two  component  vectors, 

Given  the  imaging  geometry  that  places  the  observer 
on  the  negative  2  axis  and  the  film  on  the  plane  z  =  1  (see 
Fig.  I),  let  any  surface  be  represented  by  the  equation  2  = 
px+qy +c,  Then  the  rate  of  change  of  depth  along  the  x  axis 
is  p,  and  along  the  y  axis  it  is  q.  By  the  usual  definition,  the 
gradient  is  (p,  q).  (For  more  detail,  sec  [Huffman,  J971]  or 
[Horn,  1977])  J 

The  gradient  space  has  several  elegant  properties 
As  the  image  rotates  about  the  line  of  sight,  the  gradient 
opace  representation  of  it  similarly  rotalec.  The  grad-ent 
space  is  a  dual  space  to  the  image  space,  in  which  planes 
are  represented  by  points,  and  vice-versai  imane  lines  are 
mapped  into  (different)  gradient  lines.  This  simplifies  many 
representational  problems,  Scene  properties  such  as 
convexity  and  concavity  are  also  ncat'y  reflected  ,n  simple 
gradient  space  relations. 


Intrinsic  Inadequacies 

However,  the  gradient  space  is  only  half  n  soace  It 
cannot  represent  surfaces  oriented  away  from  the  viewer’s 
mam  line  of  sight,  Although  this  is  annoying  if  the  imane  is 
taken  under  orthographic  projection  (the  usual  assumption), 
it  is  absolutely  critical  under  perspective.  Under’ 
perspective,  it  is  possible  to  actually  see  "behind”  a  surface 
(see  Fig.  2).  Such  'urfaces  are  not  representabte  in  the 
gradrent  space  at  atl;  the  fundamental  cause  is  the  gradient’s 
indifference  to  su-  ace  "sidednecs".  Under  orthography 
questions  of  sidedness  never  occur.  Under  perspective  they 
are  rampant  and  indispensable. 

Even  strictly  under  orlhography,  other  related 
intrinsic  problems  occur.  For  example,  if  the  gradient  space 
is  used  for  work  with  reflectance,  it  cannot  represent  the 
case  where  the  illuminant  is  in  front  of  the  camera  (shooting 
into  the  sun,  say). 

It  is  not  easy  to  patch  this  deficiency  in  the  gradient 
space.  It  is  possible  to  conceive  of  a  "negative"  gradient 
space,  like  the  existing  "positive"  one,  and  sutured  to  it  at 
infinity.  As  a  surface  became  infinitely  steep,  its  gradient 
representation  would  "cross  over"  into  the  negative  space 
as  soon  as  the  surface  turned  away  from  the  observer, 
Light  sources  would  do  the  same  as  they  moved  to  in  front 
of  the  camera.  However,  there  are  solutions  that  are  far 
more  pleasing. 


158 


Inadequacies  In  Application 

The  g/adionl  space  is  also  difficult  fo  work  with  m 
practice.  Since  by  definition  it  K  a  coordinate  system  based 
on  derivatives,  it  is  an  infinite  nac  ?.  Tliere  is  no 
straightforward  wn>  to  deal  with  its  vast  evpanse  in  a 
computer  program. 

II  is  a  misshapen  space  as  well.  Willur  its  unit  circle 
arc  represented  all  those  planes  whose  overall  orientation 
away  from  the  viewer  (gradient  magnitude)  are  less  than  1 
Planes  sloping  as  much  as  45  degrees  are  represented 
.  ro-  However,  very  little  happens  outside  this  tight  little 
circle;  moot  of  the  'pace  is  effectively  wasted.  Almost  all  of 
15  devoted  to  distinguishing  between  planes  that  are  small 
frau hons  of  a  degree  away  from  being  parallel  to  the  lino  of 

sight.  There  is  no  straightforward  way  to  code  around  this 
either.  1 


ADVANTAGES  OF  THE  GAUSSIAN  SPHERE 

Consider  now  Ihs  Gaussian  sphere  representation  of 
surface  slope.  Intuitively,  it  can  be  imagined  as  a 
transparent  globe  about  the  viewer’s  head,  somewhat  like 
an  oversized  fishbowl.  1,  is  dear  that  too  .surface  of  the 
globe  is  a  two-dimensional  space  (a  manifold);  il  can  be 
mapped  one-to-one  onto  the  grad.ent  space.'  However,  we 
will  treat  it  differently.  Given  a  plane,  represent  jts 
orientation  by  that  unique  point  on  the  sphere  the  plane 
would  touch  if  it  were  rigidly  translated  there  through 
space.  Such  a  representation  has  sidedness:  opposite  sides 
of  a  surface  touch  opposite  sides  of  the  sphere. 

More  exactly,  the  Gaussian  sphere  'dentifies 
orientation  not  with  surface  gradient,  but  with  a  type  of 
sur  ace  normal.  (The  normal  is  “aware"  of  which  side  of  the 
sur.ace  the  object  is  on.)  Mathematically,  it  normalizes  the 
norma  zee  or  to  length  1,  instead  of  normalizing  the 

3  r,'°r  °'  lh°  f°rm  <P’q’U  Three-space 
/it  "r  en?.  h  are  mapped  op  *  sphere  of  unit  radius 
(the  Gaussian  sphere)  in  the  straight-forward  way;  the 
poml  on  the  sphere  has  the  same  coordinates  as  the  normal 

Now  center  the  sphere  at  the  image  origin.  Consider 
h*f  axls  'he  symmetry  axis  of  a  spherical  coordinate 
system,  and  the  x  axl5  as  the  usual  theta  axis.  All  surface 
orientations  ,-ire  now  in  spherical  coordinate  form  S:nce  the 
radms  ,s  unquely  1,  drop  it  entirely,  just  as  the  third 
coordinate  of  the  gradient  is  usually  dropped,  instead  of 
P.  q),  we  can  use  (theta,  phi).  The  Gaussian  sphere  thus 
induces  a  two-angle  representation  system  that 
incorporates  all  possible  orientations  and  directions  with 
respect  to  the  observer. 


Preservation  of  Good  Gradient  Space  Properties 

eainnHS°J|a,/h  H“  bC<?n  105,1  a"d  SOme  haS  beeP 

gained.  At  the  good  properties  of  the  gradient  space  can 

oe  shown  to  be  preserved.  The  Gaussian  space  is  smooth. 

If  the  image  rotates,  it  also  rotates  as  the  gradient  space 

does.  The  dual  character  of  the  space  remains,  and  even 

the  concave-convex  relations  hold,  suitably  modified. 


Furthermore,  if  the  image  hits  C.n  the  strict  sense  of 
ho  word:  ,f  ,t  rotates  about  the  *  axis),  the  representation 
tills  isometric  ally  aboul  the  sphere  surface,  too:  a  new 
property.  It  iC.  al  o  now  very  easy  to  find  all  the  planes 
hat  are  perpendicular  to  a  given  plane.  Graphically  (F,g.  3) 
find  the  equator”  to  the  given  plane's  “pole".  The  great' 
Circe  is  the  locu-  of  the  representations  of  the  pianos 

perpendicular  to  the  plane  represented  by  tlw  isolated 

point. 


Graphic  Equivalent 

Although  the  Gaussian  sphere  is  more  powerful  than 
the  gradient  space,  the  two  are  closely  ,  elated  The 
gradient  space  is  easily  -hown  to  bo  the  projection  of  the 
Gaussian  sphere  from  its  center  to  one  tangent  piano 
Under  these  conditions,  lines  m  the  gradient  space  arc 
mapped  from  greal  circlet,  on  the  sphere. 

As  Fig.  4  shows,  it  is  clear  why  the  gradient  space  is 
only  ha  f  a  sp.ee:  a  second  plane,  parallel  to  the  first 
would  be  needed  to  represent  all  orientations.  (This  second 
pume  is  the  "negaiive"  gradient  space).  This  came  basic 
projecl'on  is  the  key  to  the  following  observations. 

THE  SPHERE  AND  SHAPE  FROM  SHADING 

Reflectance  maps  represent  the  way  observed 
brightness  is  dependent  on  surface  orientation,  given  the 
position  of  Ihe  illuminant.  In  al  least  two  well-sludied  cases 
re  eclance  maps  have  simple  representations  on  the 
Gaussian  sphere.  Furlher,  the  spherical  representation 
extends  existing  results. 

Under  the  .-.ssumplions  of  iambertian  reflectance  the 
reflectance  map  fo-ms  a  series  of  nested  conic  sections 
Horn  has  observed  that  Ihe  family  of  curve'  corresponds  to 
,  ®  'r05®*Se<  ,l0n  of  s  Property  nested  set  of  cones  [Horn, 
197/].  However,  it  Is  not  very  difficult  to  prove  a  firmer 
result  It  can  be  3nown  that  the  cones  arise  by  beina  the 
projection  of  a  family  of  latitude  lines  on  the  Gaussian 
sphere.  The  "North  Pole”  is  the  illuminant  direction,  towards 
which  the  pole  is  tilted.  (See  Fig.  5.)  The  percentage  of 
reflected  light  is  simply  the  sine  of  the  latitude  measured  in 
degrees,  with  respect  to  the  pole.  Thus,  equal  spacing  of 
latitude  lines  yields  equal  differences  in  reflectance.  The 
equator  (a  line  in  the  gradient  space)  corresponds  to  those 
surfaces  just  beginning  to  be  self-shadowed. 

Consider  the  assumption  that  reflectance  is  related  to 
incidence  and  emergence  angles  by  the  value  cos(i)/cos(e) 
is  is  the  so-called  lunar  reflectance  relation  [Horn,  19771 
Under  these  conditions,  the  reflectance  map  consists  o< 
parallel  lines.  On  the  sphere,  these  map  into  longitude  tines. 
The  se  f-shadowmg  line  is  again  Ihe  line  of  perpendiculars 
to  the  light  direction;  this  time,  however,  the  longitude  lines 
are  parallel  to  it.  An  extra  advantage  of  this  representation 
is  that  it  is  easy  to  find  the  lint  along  which  reflectance  is 
unity  maximal  reflectance).  II  is  simply  half  Ihe  angle  (but 
no  half  the  gradient  space  distance)  from  the  origin  to  the 
light  direction. 

The  gradient  space  method  cannot  represent 

illuminants  in  front  of  the  observer,  Not  only  can  the 


THE  SPHERE  AND  OTHER  SHAPE  METHODS 


spherical  system  represent  them,  but  it  can  be  shown  that 
the  representation  is  smooth  and  straight-forward. 
Essentially,  Ihc  reflectance  always  arises  from  latitude  lines, 
even  if  Ihc  sphere  is  pomled  away  from  the  observer  lo  the 
light  source;  there  is  no  dilfcrencc  in  Ihc  mathematics. 


THE  SPHERE  AND  SHAPE  PROM  TEXTURE 


Normalized  texture  property  maps  convey  information 
very  much  like  reflectance  maps;  in  fact,  Ihc  taller  can  Dc 
considered  arc  special  cases  [Kcndcr,  1979],  Under  many 
circumstances  the  texture  map-  arc  also  made  very  simple 
under  the  Gaussian  represental ion 


If  the  textural  element  clior.cn  for  I  ho  determination 
of  surface  orientation  is  an  image  slope  ctcinenl,  and  the 
texel-texcl  relationship  is  parallelism,  Ihcn  the  results  are 
especially  simple.  This  is  the  siluahon  that  occurs  with  lilcd 
floors,  brick  walls,  hairbrushes,  etc.  Under  pcrspcclivc,  the 
curves  that  are  generated  in  the  gradient  space  arc  again  a 
conic  family,  as  with  lambertian  reflectance.  However,  the 
parameter  that  distinguishes  one  curve  from  another  is  the 
texel-surf ace  relation. 


Suppose  one  assumes  that  a  surface  is  perpendicular 
to  the  textural  objects  defining  This  is  the  case  of,  say, 
finding  the  ground  plane  from  the  orientation  of  vertical 
trees.  Then  two  tcxcis  generate,  thrnugh  their  normalized 
maps,  a  polar  point:  this  is,  in  fact,  ihc  vanishing  point  of 
the  texture  itseff.  This  point  orients  the  sphere,  as  if  Ihc 
point  were  a  light  source.  If,  however,  Ihc  tcxcis  arc 
assumed  to  lie  in  the  plane  they  define,  the  curve  generated 
is  a  line  in  the  gradient  space  (an  equator  on  the  sphere) 
that  is  analogous  to  the  fine  of  self -shadow.  Intermediate 
texel-surfacc  relations  generated  intermediate  curves;  the 
sine  of  the  angle  between  tcxcl  and  surface  is  analogous  lo 
reflectance.  The  relation  is  represented  by  the  ’’hairy 
sphere  of  Fig.  6.  Note  that  the  image  of  the  texeh  form 
longitude  lines  that  point  to  the  image  of  the  pole  in  the 
gradient  space. 

The  analogue  bejwecn  parallelism  under  perspective 
and  lambertian  reflectance  is  not  quite  exact.  The 
reflectance  map  stops  at  the  line  of  self-shadowing,  and  a 
given  reflectance  value  generates  only  one  curve.  However, 
a  given  intermediate-value  tcxel-surface  relation  generates 
two  full  conic  sections;  one  each  for  Ihc  two  ends  of  a 
texel  that  can  contact  the  sphere.  For  a  full  analogy,  we 
would  need  for  the  reflectance  map  an  anti-sun  emitting 
darkness  from  a  point  exactly  opposite  the  true  illuminant 
position. 


Other  relationships  also  map  into  simple  Gaussian 
representations.  Area  under  orthography  is  identical  to  the 
map  of  lambertian  reflectance  with  the  illuminant  at  the 
observer.  That  is,  both  are  again  latitude  lines,  with  the 
sphere  pointed  straight  at  the  origin.  (The  gradient  space 
has  concentric  circles  about  the  origin).  Length  under 
orthography  maps  into  longitude  lines  similar  to  those  of  the 
lunar  reflectance  case.  The  actual  normalized  properties 
differ  in  size,  though  the  orientations  of  the  longitudes  are 
purely  analogous. 


Woodham  has  shown  that  .v.rump'.ons  about  occluding 
contour  can  be  exploded  in  the  determination  of  surface 
shape  [Woodham,  1973]  Specifically,  assume  an  object  lo 
be  a  right  generalized  cone  that  is,  its  cross  sections  arc 
circles.  Then  information  from  the  silhouette  is  sufficient  to 
create  constraints  in  the  gradient  space  that  can  bu  used  to 
derive  local  surface  orientation. 

Unfortunately,  hts  results  are  expressed  in  an  object- 
center  ccJ,  parametric  form.  Bolh  p  and  q  are  given  m  terms 
of  very  complex  functions  of  an  angle  internal  to  the  object; 
this  angle  is  measured  about  the  generalized  cone's  axis.  II 
can  be  shown,  however,  lhat  when  converted  to  the 
Gaussian  representation-^  non-lnvial  task--lhe  restrain! 
curves  are  again  latitude  lines  on  the  sphere!  Further,  the 
sphere  axis  is  aligned  parallel  to  the  generalized  cone  avis. 
Thus,  tilting  the  cone  is  simply  paralleled  by  piling  |hc 
sphere;  the  corresponding  transformation  in  the  gradient 
space  is  very  involved. 

One  last  example  of  using  Ihc  sphere  deals  wilt, 
motion.  Prazdny  has  noted  thal  if  one  had  a  retina  lhal  was 
hemispherical,  with  its  focal  pom!  at  its  cenlcr,  thei  the 
visual  flow  lines  would  have  a  simple  form  [Prazdny,  1979], 
They,  too,  would  be  longitude  lines  (for  translations),  and 
latitude  lines  (for  rotahons).  Allhough  the  retina  is  not 
hemispherical,  the  imagined  fishbowl  aboul  the  observers 
head  has  the  .equired  properties;  Ihc  focal  poinl  at  Ihe 
center  is  served  by  the  observer  himself. 


GENERAL  COMMENTS 

In  all  the  examples  given,  the  Gaussian  sphere 
simplified  a  relation  involving  surface  orientations;  in  all 
cases  they  were  reduced  lo  families  of  cil her  latilude  lines 
or  longitude  lines.  Each  example  also  exhibited  a  simple 
property  for  orienting  the  Gaussian  sphere;  Ihc  rest  loll  out 
naturally  once  the  pole  oricnlation  was  found.  Likewise, 
knowing  the  image  of  the  equator  also  is  sufficient  the  pole 
is  easily  derived  from  it.  This  would  suggesl  that 
determining  the  orientation  of  the  pole  (i.e.  the  light 
direction,  the  generalized  cone  axis,  the  vanishing  point, 
etc.),  would  be  most  useful  to  an  image  understanding 
system. 

There  is  a  bit  of  evidence  to  believe  this.  With 
regard  to  shading,  the  polar  point  is  the  direction  of 
maximum  reflectivity,  and  the  equator  is  the  line  of  self¬ 
shadow.  One  finds  that  in  line  drawings,  highlights  ind 
shadow  lines  are  regularly  drawn  in,  even  if  the  drawing  is 
not  shaded. 

Further,  most  smoothly  curved,  illuminated  objects 
have  a  "terminator ",  a  line  where  theTTEMow  begins.  It  is 
undetectable  with  edge  detectors,  since  it  is  not  a  sH*p- 
discontinuity  in  brightness.  The  human  eye,  however,  is 
sensitive  to  changes  in  the  second  derivative  of  brightness; 
these  changes  characterize  the  terminator.  PerhapiT  this  is 
one  reason  for  the  ability. 


Other  relations,  a  bit  too  complex  to  describe  here, 
also  exist;  again,  many  map  into  either  latitude  or  longitude 
lines  on  an  appropriately  tilted  sphere. 


With  regard  to  ♦exturc,  finding  the  pole  would  be 
especially  easy  under  the  assumptions  of  tcxels  being  either 
perpendicular  or  parallel  to  the  surface  they  dcfine--but  not 
in  between.  This  seems  Vo  be  the  case:  it  appears  lo  be 


Ssdif,iCUl'  '°  POrCO,Ve  ,0X,Ures  l0^ed  from 


oblique 


REFERENCES 


FUTURE  RESEARCH 

particular^  £%!?  ^thing.  ,n 

perpendicularities.  This  appeal  to  b°n5h'p5  involving 
many  mapping  of  a  plane  lo  it/  b°  dU°  10  lhe  one-t°- 
(This  problem  is  shared  by  the"  g^adicn^'^  C°un,crPar,s- 
m  inelegant  on  textures  invol '  Sp3Ce- ,oo  )  *'  a'so 

Kanade’s  skewed  svmmof  'nG  Pcrpendiculanly,  as  in 

Although  the  curves  Cd  oyn  ,hTh0k  ^  19^ 

are  well-known  (they  are  "rob  SP  °re.  by  lh's  mo,hod 
from  elegant.  V  ^ro-comcs"),  they  are  far 

im  ”;„‘S  rrrM  h  “  ^  ••*. 

the  sphere?  °  WCll° ns  are  a'=o  simpler  on 

to  apjroach  probllLT^TLrfa!  pr0misinS  way 

sphercal  geometry.  Most  'urfaco^o'.'T  ,hr0UEl1 
require  the  intersection  of  Mr  n  orientation  methods 

Capturing  the  impor.an^rope  ,  es  o,^;rren'a,i°n  Space 

very  different  conditions  ol  -ol  «■  h  CUrve5  undcr  ,he 
further  insights  -nlo  the  -  r^’r'^  may  load  16 
representation  and  manipulation  P  '°n  °<  their 


B.  K.  P 


D. 


»°rr  *«««.• 

T.  Kanade,  "Recovery  of  thp  Throo  n 


1  R.  Kendcr,  "Snape  from 
Paradigm,1  Proceedings 
Understanding  Workshop 
Apr.,  1979. 


Texture:  A  Compulational 
°I  the  ARPA  image 


Science  Applications 


Inc , 


J‘  R'  cC-ndCr’  Shape  frnm  Texture,"  Ph.D  The-is  Comm  f 
Science  OcDt  u  „ Computer 

(forthcoming)  CarnGS.o -Melton  Univers.ty,  1980 

K  P"now V-'r?D  *?  Bcl*li™  0 »Pl»  M»P  from  OpFical 

•I  eAoTT,,?2^  ',"pu,or  Sc,m"  «•*.  «*£*, 

"S&  SuT  £,  TcSiq”S.  '» 

Vasoaehusells  Inslilule  „i 


EiS-  t •  The  Imaging  Geometry 


Ei'e-  2. 


Seeing  "Behind"  Using  Perspective 


F'8-  a  binding  Perpendiculars 


161 


OBJECT  DETECTION  AND  MEASUREMENT  USING  STEREO  VISION 


Donald  B.  Gennery 

Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 


Abstract 

A  method  of  detecting  and  measuring  objects  for  the  purpose  of 
representing  three-dimensional  outdoor  scenes,  using  data  such 
as  that  obtained  from  stereo  vision  or  from  a  scanning  laser 
rangefinder,  Is  described.  Objects  are  approximated  by 
ellipsoids.  Segmentation  of  the  objects  from  the  background  and 
from  each  other  is  done  by  finding  the  ground  surface,  forming 
a  preliminary  segmentation  by  clustering  the  points  above  the 
ground  by  more  than  a  threshold,  fitting  ellipsoids  to  match 
these  clusters  and  to  avoid  obscuring  <he  other  points,  and 
adjusting  the  clusters  according  to  the  fits.  The  method  is 
designed  to  produce  results  useful  for  obstacle  avoidance  and 
navigation  in  an  exploring  vehicle,  such  as  a  Mars  rover.  An 
example  is  given  showing  results  obtained  from  a  stereo  pair  of 
pictures  of  Mars  from  the  Viking  Lander. 


1.  Introduction 

In  a  previous  paper  [1],  a  stereo  vision  system  and  its  possible 
use  in  an  exploring  vehicle  was  described.  This  paper  describes 
further  work  towards  a  complete  system  for  an  exploring  vehicle, 
specifically,  the  use  of  three-dimensional  data  for  the  detection  of 
objects  and  the  measurement  of  their  position,  size,  and 
approximate  shape.  Although  the  three-dimensional  data  could 
be  obtained  from  a  laser  rangefinder,  the  object  detector  is 
designed  to  be  tolerant  of  errors  in  this  data,  such  as  mistakes 
produced  by  incorrect  matches  in  stereo  vision  data  and  poor 
accuracy  of  distances  from  stereo. 

Many  approaches  are  possible  in  describing  the  shapes  of 
objects.  For  example,  generalized  cones  [2]  can  be  used  to 
describe  complicated  objects.  However,  in  some  cases  the 
resolution  of  the  data  is  Insufficient  to  produce  detailed 
information  about  the  shape.  In  other  cases,  the  objects  are  so 
irregular  as  to  make  such  detailed  descriptions  very  difficult. 
However,  in  such  cases  information  about  the  position  and  size 
and  some  crude  information  about  the  shape  may  still  be  quite 
useful.  For  example,  for  obstacle  avoidance  in  a  roving  vehicle, 
this  sort  of  information  is  adequate.  Furthermore,  this  sort  of 
information  for  each  object  in  a  iarge  Scene  containing  many 
objects  amounts  to  quite  detailed  information  concerning  the 
whole  scene,  and  thus  it  wcuid  be  useful  fer  navigation. 

For  these  reasons,  and  because  of  its  convenient  mathematical 
properties,  here  each  object  will  be  approximated  by  an  ellipsoid. 
By  "object”  we  do  not  necessarily  mean  an  actual  physical  object, 
but  merely  a  portion  of  the  scene  that  can  be  reasonably 
approximated  by  an  ellipsoid.  Thus,  if  we  use  as  an  example  a 


vehicle  exploring  Mars,  an  object  may  be  a  single  rock  on  the 
Martian  surface,  two  or  more  adjacent  rocks,  or  merely  a  bump 
in  the  ground.  Also,  an  L-shaped  physical  object  might  be 
represented  as  two  objects. 

This  ellipsoidal  representation  should  be  quite  appropriate  for 
representing  rocks  on  Mars,  because  rocks  probably  tend  to 
resemble  more  nearly  ellipsoids  than  any  other  simple  shape 
However,  it  could  also  be  used  to  represent  cars  in  a  parking  lot 
or  trees  in  a  field,  for  example,  especially  in  aerial  photographs 
where  the  resolution  may  be  poor  compared  to  the  size  of  the 
objects,  and  in  other  cases  where  precise  object  description  oi 
recogrition  is  not  necessary  but  rather  an  overall  description  of 
the  scene  is  desired. 

The  stereo  vision  processing  or  laser  rangefinder  results  in  data 
representing  the  three-dimensional  position  of  a  large  number  of 
points  distributed  over  the  scene.  The  first  step  in  the 
processing  of  this  three-dimensional  data  is  to  find  the  ground 
surface.  A  method  of  doing  this  was  previously  described  [1]. 
In  general,  an  entire  scene  would  be  partitioned  into  small  areas, 
in  each  of  which  the  ground  would  be  approximated  by  a  piano 
or  paraboloid.  Then  points  which  are  above  the  ground  by  a 
sufficient  amount  (depending  on  the  computed  accuracy  of  the 
points,  the  roughness  of  the  ground,  and  the  minimum  size  of 
object  that  is  of  interest)  are  candidates  for  points  on  objects. 

These  above-ground  points  are  clustered  to  produce  preliminary 
groupings  of  points  which  correspond  roughly  to  objects.  An 
ellipsoid  is  fit  to  each  cluster  by  first  computing  an  initial 
approximation  based  upon  the  moments  of  the  points  in  the 
cluster  and  then  iterating  a  weighted  nonlinear  least-squares 
adjustment  to  fit  the  ellipsoids  to  these  points  and  to  avoid 
obscuring  other  points  Then,  according  to  the  relative  positions 
of  the  ellipsoids  and  points,  clusters  can  be  broken  or  merged, 
and  the  process  repeats  until  the  apparently  best  segmentation  is 
found.  Each  of  these  steps  will  be  described  in  the  following 
sections. 

The  object  detection  and  measurement  process  as  described  here 
uses  on*y  the  three-dimensional  position  information.  The 
brightness  information  is  discarded  after  the  stereo  processing. 
However,  a  more  complete  system  would  use  both  types  of 
Information.  Perhaps  an  edge  detector  could  be  applied  to  the 
brightness  data  in  the  regions  near  the  outlines  of  the  ellipsoids 
in  order  to  refine  the  boundaries  of  the  objects,  for  example. 

Matrix  notation  will  be  used  throughout  this  paper.  Matrices 
will  be  denoted  by  capital  letters.  The  transpose  of  a  matrix  A 
will  be  denoted  by  A  ,  and  the  inverse  of  A  will  be  denoted  by 
A~*.  The  trace  of  a  square  matrix  A  (sum  of  the  diagonal 
elements,  which  is  equal  to  the  sum  of  the  eigenvalues)  will  be 


162 


^csLl d  ‘h(A)'  £  rCt°r  in  ihree-dimensional  space  will  be 
rnnrH  ^  i3  matrix  containing  tiie  Cartesian 

SSnStaSSOTf  by  X  WUh  an  aPP™priate  subscript! 
trionn  13J  provides  a  good  text  on  matrix  algebra.)  P 

2-  Preliminary  Ciu-tprincr 

abrn^'th  gr0Unfd  surface  Pas  been  determined,  all  points  that  are 
above  this  surface  by  more  than  a  threshold  are  clustered  to 

into  objects.  appr°X,mation  t0  the  mentation  of  the  scene 

Vnnous  clustering  techniques  could  be  used  here  One 
at  niJ'e'r  ‘l*  rflaxa£ion  method,  such  as  Zucker’s  Ml  However, 

tree  of  "rhe^nn  usterl"|.'s  d°ne  by  using  the  minimal  spanning 
points.  (The  minimal  spanning;  tree  is  the  tree 
connecting  all  of  the  points  such  that  the  um  of  the  die 

SSSSm  m,1'  ."T"?  by  USin®  the  -arei  nSbor 

here  2  iiith  ^  e"gth  °f  the  edges  of  the  is  defined 
Joints )  Th  'h;ehe-d,mensional  Euclidean  distance  between  the 
points.)  Then  the  tree  is  broken  at  every  edge  whose  lentnh  u 

Howevpthan  tWlCe  ‘he  3Verage  leng‘h  of  the  adjacent  edgel  [5] 

the  r«o  ^  an/dge  t0  be  plated  to 

me  resolution  of  the  data)  is  specified,  so  that  the  method  will 

overly  sensitive  to  local  flotations  in  the  data  Also 
maximum  can  be  specified,  beyond  which  all  edges  are  broken’ 


^ _ LLLL!igj_AfTProx^»ations  to  Ellipsoids 

,f = ?  rr  ~  srrsrs 

he  iterations  do  not  converge  This  ininai 

eAqnuat!inPS°id  b<!  rePresented  by  the  following  matrix 
(X-XC)TW(X-Xr)  .  i 

re™  r 

ofmwriC!ee’  ^ 

M  are  the  lenwhs  nf  .The  C|Uaie  roots  of  the  eigenvalues  of 
,.r  tne  lengths  of  the  semi-axes  of  the  elliosoid )  ThP 

Jnd  P  'he  comPuted  moments  and  the  matrices^ 

isyissrs  =s 

w“r-  t-  * 

ZZ  b?  ‘hat  ‘he  °bJect  is  seen  from  a  single  viewpoint 

p"™r.h."d*SM  rs 

approximation,  we  assume  an  orthogonal  projection  instead  of  2 


momp  *  pr.°fec£*hn;  Let  Xs  denote  the  vector  of  normalized  first 

abouJ  x  obmin  °H  A'  den°‘e  the  matr,x  of  second  moments 
about  X  obtained  with  this  distribution,  and  let  X  denote  the 
position  of  the  camera.  0  n 

jar  a  nr; * 

e  integration  shows  that  in  this  case  the  eigenvalue  of  M 
corresponding  to  the  eigenvector  X„-X„  is  r2/iS,  the  other  two 

Sr^rjhTx^i  ^  “  X«  P'US  2r/3  tlmes  the  unit 

'  °  Xc  direction.  All  three  eigenvalues  of  M 

ZZbd\LT  [h/S  case'  An  e"‘P30id  can  be  considered  to  be  a 
distorted  sphere  (using  stretching  and  d"’orion-)  ti 

pzsvzzr* zxt&'rsm 
Err? 

we  lppH  r '  bUt  thlS  leaves  H  0ut  of  the  factor  Of  18  by  Which 
camera d  ThisTi  th*  moments  in  the  direction  toward  the 
rhP  ml  Th  I  am0Unt  can  be  added  by  adding  M  times 

to  M  '  In  ordlr  mT  th«  surface  of  the  ellipsoid  corresponding 
rh  K  0  ^eeP  e^1Psoid  to  a  reasonble  shape  when 

l  6  a.re  not  enough  points  to  determine  it  well,  M  as  obtained 
.  is  averaged  with  a  scalar  matrix  whose  diagonal  elements 

welh  eWdhiohfrHeP/?ntS  SphCre  0f  rad,us  ^  with  the  average 

nr 

right  angles  to  = 


Xc  -  ±Z> 


Ms  *  iSxp-X6)(Xp-X8)T 

M,  -  ■IMj  + 


n  ^ap  -  \WP  -  X8) 

H<Xo-X6)(X0-Xg)T 
<X0-X£)1M-i(X-XJ 


Xc  -  x„- 


2->/2(X  -X  ) 


^x0-x8),‘m;1(x0-x8) 


I 

I 


lt>3 


min 


[max(i 


u 


r  ir(M 


K.-X.I’IX.-X.) 


). 


M 


FTTT  Mt 
W  «  M 


n  +  4 


si 


-I 


where  X  represents  any  po.nt  in  the  cluster,  n  is  the  number  of 

P~*t  CiUSter'  ‘he  summat'°ns  are  over  these  points,  and  I 
is  the  i-by-3  identity  matrix. 


4-  Iterative  Solution  for  Ellipsoids 

The  adjustment  of  the  ellipsoids  is  done  by  a  modify 
least-squares  approach.  Each  ellipsoid  is  adjusted  so  as  t, 
mmtm.re  the  weighted  sum  of  the  squares  of  two  kinds  o 
discrepancies  the  amounts  by  which  the  points  (usually  point 
n  the  cluster  being  fit)  miss  lying  in  surface" of  the  ellipsoid^  and 

h  rr  y  rhich„the  e,,ir  s°id  ^  ^  p°-™  tz 

should^ be ‘  position  (in  the  latter  case,  the  discrepancies  actually 
should  be  considered  separately  for  each  camera  that  sees  .he 

easier65"0"'  H°WeVer'  f°r  narr°w-angle  stereo  we  use  as  a 
a  on  able  approximation  the  assumption  that  the  "camera"  is  at 
the  midpoint  of  the  stereo  baseline.)  Including  the  second  kind 

shaoe  ofPth»CnhS  rfu'  ,n  he'ping  t0  determine  the  s‘ze  and 

eonf/in  ?r  bjtC  Whe"  th6  P°inIS  °n  Ihe  °bject  itself  d°  not 

contain  suff.c.ent  information.  Also  included  in  the  weighted 

orTe  the^r65  ?  k  "linlmized  are  fl  Prwrl  terms  which  tend  to 
force  the  ellipsoid  by  default  to  become  a  sphere  near  the  ground 
when  the  points  do  not  constrain  it  well.  ^ 

Pf"?  ki"d  °/  discrepancy  above  optimally  snould  be  defined 

sur Sc-  Tfg  ^°  'm  "T"*'  fr°m  ‘he  p0int  ,n  c'uestion  10  the 
. urface  of  the  ellipsoid.  However,  computing  this  requires 

di  Un«Veiwh'dTee  equati0n-  Theref0rc'  as  a"  expedient  the 
frnm  ,h  b  the  Point  and  the  surface  along  a  straight  line 

IrdTr  ^  h6nter  °f  thC  ellipsoid  t0  the  point  is  used  instead  In 
order  to  be  consistent  with  this  definition,  the  seco-d  kind  of 
discrepancy  is  defined  as  follows.  The  midpoint  of  the  'wo 

camera  tTthe  th\SUrfrace  of  the  elliPscid  ^  a  line  from  the 
firTt  kind  P  ‘  15  fJrSt  f0Und  Then  the  discrepancy  of  the 
V.rs  klnd  ,s  computed  for  this  midpoint  Both  kinds  of 
discrepancies  are  Illustrated  in  Figures  I  and  2. 

Now  we  must  consider  exactly  for  which  points  which  kind  of 
dlsc  epancy  is  computed.  There  are  five  regions  of  space  to 
consider,  according  to  whether  the  point  is  to  the  side^f  the 
Jm  sold  as  seen  from  the  camera  (that  is,  the  line  through  the 

in  frontPof  the  the  point  does  not  intersect  the  eIPpsoid),  is 
„  °nl  °f  the  elllps°id  as  seen  from  the  camera,  is  Inside  the 
front  portion  of  the  ellipsoid  (in  front  of  the  surface  of 
midpoints  as  defined  above),  is  inside  the  back  portion  of  the 

po'inu'to  cornlsSiribehlnd  ellipsoid  Als°-  there  arc  two  kinds  of 
pr  wh  w  der'  aCC0rd,nS  t0  wilether  or  not  the  point  is  In  the 
IJster  which  is  assumed  to  correspond  to  this  object  This 

Pra°ndU  2S  -h  C°H,bina[i0nS  in  alk  Which  are  illustrated  in  Figures 
l  and  2.  They  divide  Into  four  categories. 

First,  If  the  point  Is  not  In  the  cluster  and  Is  either  In  front  of  the 
I.  not  d'!cr*l”nc’  >nd  “*  P°'m 


Ctirnsra 


Figure  1.  Two-dimensional  version  of 
belonging  to  ,his  cluster.  Solid 
discrepancies  to  be  minimized. 


adjustment,  showing  poinls 
dark  straighj  fines  are 


the°front^hair  P°lm  “J"  the  duster  and  is  either  in  front,  inside 

i„d  KsShefrom h  ZVZ  Z  P°im  15  not  in  ‘he  S? 

alf,  the  first  kind  of  discrepancy  is  used. 
g£L  ?  »"d  i,  behind  li.e 

*  lW0  P0,nts  in  the  computations.  This  Is  became  tbS  * 
separate  components  of  error  in  this  case-  rhp  Ah  -  ’  V  ^  W° 

bulges  out°in^back  (nZZ to, 85? 


SUi  ■W»««n 


164 


(X-X/w(X-Xc)  -  I 

xnd,nHefhqUati°n  °f  3  Slraig,M  line  throu?h  the  camera  Position 
a0  and  the  point  in  question  Xp,  in  parametric  form, 

(X-X0)  .  u(X  -x0) 


These  can  be  combined  to  produce 

[u<Xp-X0)+X0-Xc]TW[(Xp-X0)+X0-Xc]  .  I 

ehm[niHtS  Th  WhiCh  determ,ne  the  intersections  of  the  line  and 
ellipsoid.  This  equation  is  equivalent  to 

au2  +  bu  +  c  »  0 

where 

a  -  (Xp-X0)TW(Xp-X0) 
b  ■  2(Xp-X0)TW(X0-Xc) 

0  -  <X0-X/W(X0-XC)  -  I 

The  roots  of  the  above  equation  in  u  determine  the  region  of 
pace  in  which  Xp  lies.  If  the  roots  are  imaginary  (b2--)ac  <  0) 

r!otsP7h/?aI0  S‘de  °f  the  •'llipS°‘d-  lf  the  avera?e  of  the  'wo 
surfari  if2  “  P°S“IVe'  the  point  is  in  front  of  the  midpoint 
urface,  if  negative,  it  is  behind  the  surface,  if  the  roots  are  real 

to  WhetnherShn(hfr0nt  °f'  beh‘nd'  or  inside  the  elliPSoid  according 
o  whether  both  roots  are  greater  than  unity,  both  roots  are  less 

than  unity,  or  unity  lies  between  the  roots,  respectively 
Alternatively,  we  can  use  the  fact  that  the  point  is  outside  of  the 
ellipsoid  if  and  only  if  (Xp-XC)TW(X  -Xc)  >  1 


p  -‘c7  W9'*'p 

The  discrepancy  of  the  first  kind  is 

£  ■  ■nAvx7t(xp-xc)(i 


-v/(Xp-Xc),‘W(Xp-Xc) 


) 


midnolnt  if  C  p  the  d,screPancy  of  the  second  kind,  the 
m  dpo  nt  of  the  intersections  of  the  camera-point  line  with  the 
ellipsoid  is  first  obtained  as  follows 

b 


Xu  "  “£(VXo>+Xo 


Then  the  discrepancy  of  the  second  kind  is 

£  ■  -V'VXc)T(Vxc)(l  -  j 


^/(XU-XC)‘W(XU-XC) 


) 


ha^eUbripeie  mSy  66  erroneous  P°ints  In  ^e  data,  points  which 
have  large  discrepancies  relative  to  the  size  of  the  ellipsoid  are 

given  less  weight  In  the  solution.  The  weighting  function  used  is 

1 


CO 


1  +  2(  J(X-Xcy‘  W(X-Xr)  -  1)2 


tr(W) 


+  cr2 


ZcZ  l  T  XP  0r  Xv  for  d‘screpancies  of  the  first  or 

deviaJnn  or'  respectivel)'-  and  «  the  component  of  standard 
deviation  of  measurement  errors  in  Xp  propagated  into  the 

discrepancy.  (If  these  are  unknown,  ff  can  be  zero.)  Thus  the 
dimensionless  quantity  to  be  minimized  (by  adjusting  Xc  and  W) 

is  Scot  plus  some  additional  terms  for  a  priori  values  yet  to  be 
to  the^Verts^f  X^a'ri  wUar‘|tUy  'S  minirnized  on|y  with  aspect 

through  m  ,  X$  d  W  ng  thr0Ugh  £  and  not  their  «ff«tt 

Gan«g  ,u  inrcsrder  ‘°  soIve  the  above  nonlinear  problem  the 
Gauss  method  [6)  is  used.  This  method  is  equivalent  to  using 


the  partial  derivatives  of  the  discrepancies  to  approximate  the 
nlmeBr  prob.em  by  a  linear  statistical  model  [7],  solving  the 
r  problem,  and  iterating  this  process  until  it  converges. 

o°fnxan"°Hnwterat,0njhe  f°"0Wing  15  done'  The  current  values 
o  X  and  W  are  used  >o  compute  for  each  point  the  value  of  f 

r  '  ln;  ab0Ve  and  the  l-by-9  matrix  P,  which  consists  of  the 
and  7h  d  Va“ves  of  £  Wlth  resP«'  to  the  three  elements  of  X 
f"d  'he  SIX  uni(lue  ele™nts  of  W  (W  is  symmetrical)  The 

which  e!chUmma“0n5  a"  °f  the  P0,nt5  are  computed,  in 
h  ch  each  point  in  the  first  category  above  is  not  used  earh 

point  in  the  second  or  third  categories  appears  once  and  each 
point  in  the  fourth  category  appears  twice; 

H  '  H0  + 
c  *  co  + 

THh0ena^eCq0haureiUSed  f°r  ^  8  pri°ri  Valuw  '/«  ‘°  ^  ciiscussed.) 
hen  the  9  by—  i  matrix  of  corrections  is 

D  -  vH']C 

vprv^n V  !5  3  faC‘0r  U5ed  t0  lmProve  convergence  because  of  the 
earlv  ,near  nature  of  the  problem.  (Currently  v  .  0.5  on 
nrnri^irl  '  bUV  "  '  after  a  tes:  indicates  that  this  will 

fubfn  LT1'  convergence.)  The  elements  of  D  are 

ubtracted  from  the  corresponding  elements  of  X  and  W  to 
obtain  the  improved  approximations  for  the  next  iteration. 

N°W,  th®  a  pTi0Ti  values  wil1  be  discussed.  In  some  cases  the 
points  affecting  the  ellipsoid  will  be  insufficient  in  number  o 
insufficiently  distributed  to  determine  all  parameters  of  the 
ellipsoid  very  well.  It  is  therefore  desirab!eP  toTave  c  Liorf 
values  for  some  of  the  parameters  with  appropriate  weight  in  the 

DO  ins  do0  nnr 5traln  ,0  reasonable  default  values  when  the 
P  do  not  contain  sufficient  information.  When  there  is 

v7ry  little  effeacl°h  *  fu'’’  ‘he  8  pTiori  values  wi"  have 

ry  little  effect  because  of  their  small  weight.  The  a  Priori 

unde**  ref"o  yr  |he  gr°Und  surface  height  d‘^"y 
„  for..  ,h*  vertlcal  component  of  Xc,  with  weight 

tr(M)2M0  TJUty  \  the  diagonal  elements  of  W,  with  weight 

£  L&  ero/or  :  e  off-d,iag°nai  ei^^ts  of  w,  w,th 

weight  tr(M)  /I0,  where  M  .  W'1.  (including  tr(M)  as  shown 
sea  es  things  correctly  so  that  the  solution  is  invariant  under  a 
scale  f^or  ch.nge.)  The  effect  of  the  W  terms  Is  m  try  to  force 
the  ellipsoid  into  a  spherical  shape.  These  a  priori  terms  are  put 

H  corresSd  0  '7  thehfollowing  waX'  The  diagonal  element^ 
the  rPr,ng  0  he  VeniCal  comP°nent  of  xc  is  0. l/tr(M), 
the  three  diagonal  element;  corresponding  to  the  off-diaeona 

tteTaeonaUfH6  6aCb  tr(M)  /10'  and  tbe  3-by-3  submatrfx  on 
the  diagonal  of  H0  in  the  position  corresponding  to  the  diagonal 

elements  of  W  consists  of  2/3  on  its  main  diagonal  and6- 1/3 

9-by  /matrix  ^  A"  °ther  elements  «  ^ 

y  Dy-y  matrix  Hq  are  zero.  Then 

C0  “  HqC 

where  C  is  a  column  matrix  of  the  current  values  of  X  and  W 

thea«nteraSofnthD;  eif  ‘ih  he,ght  °f  the  ground  d‘«cMy  under' 
center  of  the  ellipsoid  subtracted  from  the  element  of  C 

usedlnTh  t0  the  Vertkal  comP°nent  of  xc  H0  and  C0  are 
used  in  the  summations  for  H  and  C  as  previously  shown. 

5.  Breaking  and  Merging  Clusters 

Informs  ^ ,  preliminary  Mustering  is  dependent  on  local 
Informat, on,  it  may  not  produce  the  best  segmentation  based  on 


1U  J 


nt0rtoga0|faifinrhTaT-  Theref0re'  a,ter  eli,Psclds  have  been 

1  «"”>"•  f”  «•  edge  in  ,h, 

this  cl„„„  ,he  quantiiv  JfTnU  ‘P“nlnf  ““  »>«  eonnecis 
me  quantity  A(|-p)  ls  computed,  where  X  is  the  length 

ol  the  edge  and  p  is  the  minimum  of  /(X  -X  irW(X  Y~)  fnr 

'he  two  points  connected  by  the  edee  Vhen  ,hl  p,  c 
tentatively  broken  at  th»  L‘  ,  g  rhen  lhe  cluster  « 

*«“  “5  £,  eac  h  3  e  w  !cli u  ster  1  o  rmed 

>  fs*  i  mT«z  th°:  o::;^ 

biit'connectfn?  *?  “T  “  P'aCe5  ^ K1 
If  till?  new  clusterin  I  g  P  5  thal  3re  0UI5ide  the  ellipsoid. 

■d.  /jr*  isr • 

» eei^rr  z"'  “e  <iu“"  b,,»'  “<W*nC 

minimum  y.iue  for  this  q'Zn“’",  p‘!'  ”h  “» 

S'^r"  ™  et.«»  S ei«»«“fm,sS 

T~  S  SK  7rxl«f5 ; 

SMS.-4J  r: -  r  f  -s'r  *  * “  2 

quaniity  u  eompufed  to,  “» 

’■^‘S 


der mlon,11^ defined  ’0d  "W”  *•  '■» 

points  in  die  drupr  r  previous  section,  n  is  the  number  of 

number  of  points  below" th^n  'T  T'S  elliPsoid’  and  m  “  'he 
the  ellipsoid^  If  the  initial  ann  ^  threshold  but  directly  above 
and  CO  are  obtamed  iPlhTr3,''0"  *  USed  31  the  resu'‘-  £ 
denominator  is  n  instead  of  8  irh  lleratl°n'  and  ‘he  first 

xirrSn^T1*" 

fhTtwo  values  *of  (Ms  T'l^T  ^  Chosen  if  the  %% 
Custer  Otherwise,  th'e  sing.:  SS^SSSS^  q  “*  *+ 

g.  Results 

tiSdi*  n°*  «*  «..«* 

25$  nivpie  The  i  ^ars<  Each  picture  is  256  pixels  by 
elevaK  (Thus  the  aTfij  °W  t*™  in  and 

azimuth  and  elva  lor  mm  the  fer,  “  ^  10  deS^>  The 
Picture  are  about  .8  Zees  n  “2ft  “  ^  °f  the 

m“r,p,r  sr;. 

fr.mLrLStrc'^ir1”5  ,n 

e  ght  pixel-square  window  which  was 


resolution  of  t h e " red u cedTtereQ3 dacaZ  o' dP'CtUre  (Thus  the 

ZpoiZ^ 

nominally  horizontal  referpnrpP  i  COmputed  for  fhls  P0|nt  to  a 
cameras/with  he  ba '  of  Z  9  '  5  meters  below  ‘he 

projected  into  he  p  ure  Fi^  S Th,  ^  T  ^  3nd  are  then 
except  that  the  base?  of  th  5  the  Same  data  35  F‘S  < 

computed  from  this  data  bv  the  ^rows  res‘  on  a  ground  plane 
the  sat  lie  as  Fir  5  e  ,  I  '  gTnd  SUrface  flnder-  F,g .6  ,s 

dd  «  '-..I  5  temimeiert  «...  '0"’P“"d  “ 

^■S^SSjSS!  r“,r„ fd'  T"°*  - 

w™K,r: r  szs  r » '»» "-.sz:” 

points  was  20  centimeters  »?’aXlmum  dls‘ance  for  connecting 
minimum  and  maximum  Drnrii"^  Zer0  lnd  lnfin'‘>'  for  ‘his 
clustering  but  identical  final  res u 1, 3  *  S'g  7  d'fferem  ln,tlal 


Md 


^ A.  \  ■;  ./:•  ,,  Jf 

<•  •«•  W-V:tOXe3A,.Kv:r  ‘ -rxi  >>*.  '  i 


figure  3.  stereo  pair  ol  Martian  surface. 


»}  :i  ft  '*:"  iW>-Ji 

“f  f'.'! 


f  11 


I:  T  i  :*r  ,#S 


H  Hi  11:  If  1 

*r  fllllf!  feilli  |L 

■SiE1*  ll!ii 

&££%£*  » — »*»  «-*  mm. 


J  66 


'•>>/*  •  '  .  V;  wf /  -''"'s  v  y  ** 

,  ,*  ..  v.  •  ,^  >y>,,  -  K/  ;••  ■  ;..>,■ 

®1 .„  ^/"  *1  ">* 'X*  “  *»«  y-.% 

yy\.p  A  A  A  «  Y*«  *  O'  :••<  v  .& 

fc4''*.  1  4s  .X  -  "  A  *  Z  T  “  “  Z  -r  "  "  "  «f>  * 

&z t  ~:j 

“  *»“‘**r+  a  *  A  A-  >.  I J  2 ■><■•  -  ?  V ik?> 


'"  "  l»1 

IHto 

"  .?■  t  £  A  V  «  f  v 

V  m  .4»  „  J  *,.*>! 

F 

A  C*  Vi  A  A 

III 

-  v  t*  Vv„  a  i  i 1 


Urffil 

*  X* 

n 


’  r 


f  n 

.j*i  i 

lip'  'is^'V 

j  :l  j  V  ^  •'  “  v  v  T 

I  l  1  ■  *  ■•  '  I 


i  wi'  >•...!■ 

•■•*  -t *  *  *»«•  III  i:  a  i  <•  - 

•  :tXr«fl  -  t-*.*.;  »' 


A  A 

<fc 


. ;;,,lf  ffl 

;..,  *  #*;  mm 


*  ^  ** 
i  •  n 


f  5i'v  V.  tt  V  v  n 

j,  r  /.  ,k*;iv 


•  A  •{»  T  t  *  *  V  „ 

:.  ?&$?•  v  Sfyivl  *•  ?  ••  ••  ^%  V'  <■■■ 

::S  >X>;  .^.  WISa^A^^:  -.  .  .  •  '  *,3$* 

Figure  5.  Heights  above  computed  ground  plane. 


Figure  7.  Mimimal  spanning  trc.  connecting  pcnts  above  5  cm, 
vertical  view.  Solid  lines  shov.  initial  clusters. 


x;-'  w Z  «  •  /:•  - 

..y.-'y/t- \  -tV  v  ..  •  •■  ./*>»•..,.•••  .•  VAW-.  :■  jc: 

.  .  vm  ><-->§. 

.  V  ;.  /*/'!■/  .  .  ;,.  •.  ^  ^ 

y. ‘ .  J8 


<  _ 


.  r  n  ,  ®  ^  <' 

ii»  '  If  x  jf,  4":  ,  b'X'^"-:' r 

..  -:;V  ‘ '”'  ■//''\  *'■:>''?/  +  ■ ,  /.  •;  •  *•"  ' 

1 4  ■  '  r  .vx 

1^'  1,11 


..>>w,w>  . . 

*  X>  *  \ 


iiP 


4 


1 1-' ?<<■■&:  :.\wr 

I  ■•  >  ■/,'• 

I  W&^.4 


I  .j  I. 

:b;i  ..*  ^ 

r  hj',  W  *0 


\"f  / 1  ^  ^ 

'•ri-v ‘  ‘  v^vi  ••  ‘ 


•V-  =  v  ap 

r,-  •  •  f 

^-vsdfc,  '  .;:i‘ .,. 

j  v  ^  1 

v.v  ••  ■ 

”-'y:  r  :  ^  :¥;  A  A  :<N,..  j  - 

}  ':  p  -  >  ^  .  a,”''  .^jt;  ^  1  4/,  /  ::,wrv<;. 

:-V  ■•.:■;  •;  '•■•■■ Vi’  !•>  $>.  v</'  ':>va  ^  v»/a 

t?  ..  /i  j  ' 1 

,  .  ■  ,  i,  *  $  A  M  •>  .  .:  '  **6*  f 

,  l  ^  ^ v>,'^  r>.  /  ,  ^  **;  z  .  . 

Figure  6.  Heights  above  computed  ground  plane,  for  points  above 
5  cm. 


Figure  8.  Ellipsoids  fit  to  initial  clusters. 


Figure  9.  Ellipsoids  fit  to  final  clusters. 


Fig  7  shows  the  points  in  Fig.  6  in  a  nominally  vertical 
orthogonal  projection  (perpendicular  to  the  reference  plane). 
The  figure  covers  an  area  one  meter  by  one  meter  The  symbol 
for  each  point  represents  height  in  centimeters  above  the  ground 
plane,  with  the  letter  A,  B,  C,  etc.  representing  the  values  10, 
II,  12,  etc.  The  pants  are  connected  to  show  the  minimal 
spanning  trees  that  wer'  computed.  Solid  lines  connect  points 
within  each  initial  cluster 

Fig.  8  shows  the  ellipsoids  that  were  fit  n  the  initial  clusters. 
Each  ellipsoid  is  represented  by  two  eirnse*.  One  ellipse  is  the 
orthogonal  projection  of  the  ellipsoid  onto  th?  reference  plane. 
The  other  ellipse  is  the  intersection  of  the  ellipsoid  with  a  plane 
through  the  center  of  the  ellipsoid  and  parallel  to  the  reference 
plane.  Only  the  clustered  points  are  shown  here,  as  in  Fig  7 
However,  as  previously  described,  any  of  the  points  shown  *n 
Fig.  4  may  have  been  involved  in  the  adjustment  of  the 
ellipsoids.  Remember  that  the  fit  is  done  in  three  dimensions, 
whereas  Fig.  8  shows  a  two-dimensional  projection. 

Fig  9  shows  in  the  same  way  the  results  of  the  breaking  and 
merging  operations.  The  two  clusters  in  the  center 
(corresponding  to  the  large  rock  in  the  center  of  the  pictures) 
were  merged  into  one,  and  a  new  ellipsoid  is  shown  for  this 
cluster  The  other  clusters  were  not  changed. 

These  results  were  projected  into  the  left  picture  to  produce  Fig, 
10.  The  outline  of  the  ellipsoids  as  they  would  be  setn  from  the 
left  camera  are  superimposed  on  the  picture.  The  lengths  of  rhe 
principal  axes  of  the  large  ellipsoid  in  the  center  are  36. 5,  30.5, 
and  19.8  centimeters. 


Figure  10.  Ellipsoids  fit  to  final  clusters,  p/ojected  into  left 
picture. 


Acknowledgments 

This  work  was  performed  under  NASA  contract  NASW-29IC 
and  the  Defense  Advanced  Research  Projects  Agency  contract 
MDA903-76-C-0206.  The  Mars  pictures  were  supplied  by 
Elliott  Levinthal  and  Sidney  Ltebes  of  the  Viking  Lander 
Imaging  Team.  This  paper  was  first  presented  at  th-  Sixth 
International  Joint  Conference  on  Artificial  Intellegence,  held  in 
Tokyo  in  August  1979. 

References 


[1]  Gennery,  D.B.  "A  Stereo  Vision  System  for  an  Autonomous 
Vehicle. "  In  Proc.  \JCAU77.  MIT,  Cambridge,  Mass.,  August 
1977,  po.  576-582.  (Also  IU  Workshop,  October  1977.) 

[2]  Nevatia,  R.  S:  Binford,  T.O.  "Description  and  Recognition  of 
Curved  Objects,"  Artif.  IntelL  Vol.  8,  February  1977,  pp  77-98. 

[3]  Hohn,  F.E.  Elementary  Matrix  Algebra  (Third  Edition).  The 
MacMillan  Company,  1973. 

[4]  Zucker,  S.W.  "Relaxation  Labelling  and  the  Reduction  of 
Local  Ambiguities,"  In  Proc.  Third  Internationa l  Joint  Conf.  on 
Pattern  Recognition.  San  Diego,  Calif.,  November  1976,  pp. 
852-861. 

[5]  Duda,  R.  Sc  Hart,  P.  Pattern  Recognition  and  Seem  Analysis. 
Wiley,  1973. 

[6]  Bard,  Y.  Nonlinear  Parameter  Estimation.  Academic  Press, 
1974. 


[7]  Graybill,  FA.  An  Introduction  to  Linear  Statistical  Models, 
Volume  I.  McGraw-Hill  Book  Company,  1961. 


168 


EDGE  BASED  STEREO  CORRELATION 


H.  Harlyn  Baker 


Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 


f  ■  . 


Abstract 

An  edge  based  approach  to  stereo  depth  measurement  ts 
described.  Its  processing  consists  of  extracting  edge  descriptions 
of  a  pair  oj  images,  linking  these  edges  to  their  nearest  ( projective ) 
neighbors  to  obtain  the  connectivity  structure  of  the  imager,  cor¬ 
relating  the  edge  descriptions  in  an  eptpolar  fSf  reference  frame  on 
ihe  basis  of  local  edge  properties  (here,  assuming  the  input  images 
are  registered  such  that  scanlines  correspond),  and  cooperatively 
removtng  those  edge  associations  formed  by  the  correlation  which 
violate  the  connectivity  structure  of  the  two  images. 

Edge  Correlation 

The  long  term  interest  of  this  research  is  in  enabling 
a  computer  to  build  S-dimensional  models  of  the  components  of 
its  environment.  To  build  such  models  one  must  have  an 
automatic  process  for  obtaining  three-dimensional  informa¬ 
tion  from  a  scene.  This  is  the  immediate  aim  of  the  work 
described  here  -  to  develop  a  vision  system  capable  of  obtaining 
an  accurate  and  reliable  edge  based  depth  map  of  a  scene  from 
stereo  pairs  of  views  of  it. 

Accurate  and  reliable  determinations  of  the  sort  need¬ 
ed  here  require  exploitation  of  the  semantic  redundancy 
in  the  information  available  to  the  sensors.  The  approach 
to  be  outlined  uses  this  ••  intermixing  local  and  global  con¬ 
straints,  constraints  derived  from  observations  on  the  im¬ 
aging  process,  in  seeking  a  3-dimensional  interpretation.  A 
correlation  procedure  chooses  the  best  correspondence  of 
the  images  using  local  constraints  on  a  scanline-by-scanline 
basis,  and  a  cooperative  consistency  enforcement  process 
works  to  assure  3-space  connectivity  using  the  global  con¬ 
straint  of  projective  connectivity. 

The  3-D  correlation  was  chosen  to  be  edge  based.  This 
is  because  of  the  higher  accuracy  associated  with  edge 
positioning  than  with  area  matching,  the  reduced  computa¬ 
tion  and  combinatorics  in  dealing  with  edges  rather  than 
with  area  templates,  and  the  desire  (at  least  initially)  to 
work  with  those  parts  of  the  image  (and  hence  of  the  scene) 
with  the  greatest  information  content  -  the  locus  of  inten¬ 
sity  contrasts  between  surfaces. 


Edges ,  as  they  are  defined  here,  occur  at  those  places  in 
an  image  where  the  second  derivative  from  a  7X1  or  1X7 
bar  operator  (with  lateral  inhibition)  crosses  zero  Each 
edge  has  a  two  dimensional  slope  in  the  image  plane  (only 
those  edges  with  a  vertical  component  in  their  slope  are 
used  in  the  correlation),  a  contrast  across  it,  average  and 
local  intensity  measures  of  the  intervals  to  its  left  and  right, 
2-D  connectivity  to  other  edges  on  prior  or  subsequent  lines, 
sub-pixel  positioning ,  a  measure  of  the  extent  of  *ts  breadth , 
and,  increasing  across  the  image,  an  index. 

average  intemity  of  interval  to  right 
^•intensity  locally  to  right 


link*  to  neareit  neighbor* 
ilope  • 


(sub-pixel)  position 


edg^s  index  »  i 


intemit)  locally  to  let 
average  intemitj  of  interval  to  left 

Edge  properties 
Figure  1 


It  is  these  properties  of  edges  along  scanlines  in  the 
two  images  which  provide  the  metric  for  the  correlation  (l). 


T’i  mark  intensity  value*,  vertical  line*  are  edge* 
First  and  second  image  scanline  edge  depictions 
Figure  2 


169 


Edge  based  descriptions  of  images  are  also  generally 
more  structured  than  are  area  based  ones,  as  the  linking 
phase  of  the  process  establishes  edge  connectivities  in  tne 
two-dimensional  image.  This  connection  in  the  image 
plane,  suggesting  connectivity  in  the  actual  scene,  provides 
the  semantic  component  for  a  cooperative  process  to  deter¬ 
mine  the  best  3-D  interpretation  of  the  scenes  edge  depic¬ 
tion  -  the  line-by-line  edge  correlation  procedure  chooses 
the  best  association  of  first  image  edges  with  second  image 
edges  from  the  available  local  information,  then  edge  con¬ 
nectivity  is  used  to  either  verify  or  repudiate  these  pairings. 

Experimentation  was  done  with  two  basic  approaches 
to  the  correlation,  contrast  based  and  half-edge  based.  This 


ii®P»lS§fe 


Second  image  edges 
Figure  3a 


report  will  describe  the  genesis  of  the  present  system  as 
*  progressed  through  these  approaches.  Accompanying 
these  changes  in  approach  was  a  change  in  the  basic 
computation  process  from  a  branch&bound  search  to  a 
Vitcrbi  [2]  dynamic  programming  algorithm.  This  com¬ 
putation  change  was  brought  about  by  the  realization  that 
the  analysis  of  busier  images  (such  as  the  Night  Vision 
Lab  imagery),  use  of  fewer  ad  hoc'isms  in  parameter  set¬ 
tings  (through  measures  of  image  statistics  for  finer  con¬ 
trol  of  noise  based  thresholds),  and,  as  always,  the  search 
for  greater  generality,  all  lead  to  increased  combinatorics. 
Although  I  will  begin  with  a  description  of  contrast  based 
branch&bound,  much  of  what  is  decribed  applies  to  both 
correlation  approaches  and  both  methods  of  correlation 
computation. 


f*-.v  : 


First  image  edges 
Figure  3c 

§rf  M;.'* 

-U-. ' Wfe®?’  .. 


**  „  -  M-'t 

t-A  -A 


Mr- 


'  f/Aj  w<  . 

Second  image  connectivity 

Figure  3b 


Si 

V-OTv 


First  image  Mantctirity 
Figure  3d 


170 


Preparing  for  the  Correlation 


Branch  and  Bouna  Correlation 


Initially,  for  the  branch&bound  correlation,  edges 
were  grouped  by  their  contrast  sign  and  ordered  by  their 
strengths.  An  edge  from  the  first  image  would  be  said  to 
be  a  possible  mate  to  an  edge  from  the  second  if  its  sign 
was  the  same,  their  strengths  similar,  at  least  one  of  the  sur¬ 
faces  bounding  them  to  the  left  or  the  right  being  similar 
enough  in  average  intensity  in  each  view  to  be  considered 
‘the  same’,  and  their  relative  positions  (indices)  being  clote 
enough  tn at  to  assoHwte  them  together  would  not  ea  :lude 
too  many  pairings  from  the  correlation  (implied  error).  The 
insistence  that  the  edge  contrast  signs  be  the  same  forbids 
the  matching  of  an  edge  from  say  a  grey  surface  projected 
against  a  white  background  with  itself  from  a  different  view 
against  a  black  ground  (this  restriction  is  not  present  in  the 
half-edge  based  approach). 

A  list  of  such  possible  mates  is  then  formed  for  each 
edge  of  the  second  image  scanline  and  ordered  by  a  sum  of 
normalized  scores  of: 

( difference  in  contrast)* , 

(difference  in  intensity  of  surfaces  to  sides)*, 
implied  error  (missed  edges,  if  this  association  is  made), 

and  a  final  nonlinear  component  putting  edge  matings 
whose  left  and  right  surfaces  are  both matchable  to  the  head 
of  the  list.  Correlation,  then,  is  the  process  of  finding  the 
‘best*  set  of  possible  pairings  of  the  edges. 


What  drives  the  correlation  is  a  search  for  'explan¬ 
ation*  of  the  greatest  number  of  edges  in  the  scene  (an 
‘explained’  edge  is  one  which  has  been  unambiguously 
positioned  in  3-space),  and  this  metric  provides  the  most 
effective  element  of  the  pruning  technique  (for  indeed,  the 
combinatorics  can  become  far  too  great  to  allow  unbounded 
expansion  of  the  search  tree).  A  running  count  of  the 
implied  error  (edges  which  cannot  be  correlated)  is  main¬ 
tained  at  each  stage  of  the  expansion  of  the  correlation, 
and  any  partial  assignment  giving  rise  to  an  implied  error 
above  that  of  the  ‘best-so-far’  (initially  requiring  50%  of 
edges  to  potentially  match)  is  denied  further  expansion  at 
that  point. 

This  first  approach,  contrast  based  correlation,  showed 
itself  to  be  an  acceptable  technique  with  the  data  used  for 
its  development  (Figure  4  demonstrates  the  quality  of  an 
early  contrast  based  branch&bouDd  correlation,  when  run 
with  unrealistically  high  thresholds  on  a  fairly  noise-free 
non-busy  stereo  pair),  but  its  shortcomings  -  not  allowing 
contrast  reversals  to  match  on  one  side  or  another  (such  as 
grey  moving  from  a  white  to  a  black  ground),  and  requiring 
the  actual  step  (contrast)  to  be  of  similar  size  -  made  it 
necessary  to  consider  other  approaches  for  a  more  general 
solution. 


the  fir«l  M  VXplt  '  The  fi£Ur‘  10  ,he  ri®hl  »  ^  following  the  conneetivi. 

^rlel  J„n  dr4W‘°g  *“*'  ‘,etw"n  th0,e  edK"  “  that  image  wh.ch  have  been  a.eociated  by  th 

thin  u.  n  ■S*'  “  th*  ,eC0:id  imag*  (ie'  they  haT*  in  the  other  image),  but  rlthe 

u.ed  Thf.  m,™ '  ,w87  e°.0rdlnat'  the  coordinate  reference  frame  of  the  .econd  im-se  i 

ed.  Thii  meani  that  when  following  a  connected  let  of  edgei  in  the  fint  image,  lav  the  bach  of  th 
hand  everything  will  look  fine  a.  viewed  from  the  other  image  until  an  edge  a.ioc’iated  wTth  lomethL 

whatlver  n^t'of^b  '  a?  ^  ^  °f  th*  haD<i  “  enco™tered.  At  thi.  point  a  line  will  be  drawn  t 
that  part  oUh.  image  **  S*  a"0CiaUd  WHh’  Pr°dUciD8  a  Doti“able  hori.ontal  jag  t 


Correlation  results 

prior  to  cooperative  connectivity  enforcement 
Figure  4 


h _ ,T*e> _ pr,r. 


u- 


17! 


The  second  approach,  half-edge  based  correlation,  was 
developed  fco  allow  the  possibility  of  contrast-reversed  edges 
matching  on  one  side  or  another.  Here,  the  sign  and 
strength  of  edges  are  ignored,  and  only  the  ordering 
parameters  (as  specified  above)  are  used  in  forming  lists 
of  possible  mates  for  each  second  image  scanline  edge. 
Unfortunately,  there  being  in  effect  both  a  left  and  a  right 
edge  to  deal  with  where  before  there  was  only  one,  the  com¬ 
binatorics  tend  to  get  way  out  of  hand  (a  typical  scanline 
in  the  Night  Vision  Lab  imagery  has  about  30  edges,  and 
the  size  of  the  search  space  on  some  of  these  was  found  to 
be  up  into  the  billions). 

The  combinatoric  problem  is  not  confined  to  half¬ 
edge  based  correlation,  as  in  truth  it  is  necessary  with 
both  approaches  to  limit  the  computation  before  letting 
the  branch&bound  correlation  loose.  This  trimming  was 
done,  with  both  the  contrast  and  the  half-edge  based  ap¬ 
proaches,  by  removing  from  consideration  those  scanline 
edges  (either  first  or  second  image)  of  lowest  contrast 
(where  it  is  presumed  that  edges  of  lower  contrast  are  less 
significant  to  the  modelling)  until  the  total  estimable  com¬ 
binatorics  passed  under  some  prespecified  maximum  (50000 
permutations).  ( Implied  error  bounding  rarely  let  the  actual 
number  of  permutations  required  exceed  the  hundreds). 

Now,  with  either  of  the  two  approaches  the  scanline 
correspondence  having  greatest  number  of  'explained'  edges 
(in  case  of  equivalent  counts,  lowest  sum  of  squared  inten¬ 
sity  difference  at  sides  of  edges)  is  chosen  as  the  result  of 
the  correlation.  This  is  a  Bet  of  pairs  (see  Figure  5): 

{  {/it  6i)  i  /i.  correlates  with 

3  a 

t,  j  even  -►  left  of  edge ; 

«,  j  odd  *-►  right  of  edge } 

Clearly  there  will  be  miscorrelations  among  these  -  ex¬ 
perience,  and  a  little  thought,  have  shown  that  they  can 
Burely  be  expected  to  occur  near  the  periphery  of  the  scan¬ 
line,  where  the  need  to  correlate  bounding  edges  is  not 
present  (the  global  constraint  of  maximizing  the  edge  pair¬ 
ings  has  diminishing  effect  near  the  sides,  where  the  relative 
displacement  of  the  images  means  no  correlation  exists). 
And  of  course  there  will  always  be  edges  that  do  look  alike. 
With  this  local  ambiguity,  one  can  expect  to  always  have 
incorrectly  assigned  edge  pairB.  It  is  up  to  a  further  analysis 
with  more  global  information  to  remove  these  miscorrela¬ 
tions. 


SEcnu^im*SP 


{(2,  4)  (5,  5)  (6,  6)  (7,  7)  (10,  10)  (11,  11)  (12,  12)  (13,  13)  (14,  14) 
(15,  15)  (16,  16)  (17,  17)  (21,21)  (22,  22)  (23,  23)  (24,  24)  (25,  25) 
(26,  26)  (27,  27)  (30,  30)  (35,  33)  (36,  34)  (37,  35)  (41,  37)  (42,  40) 
(43,  4i)  (44,  42)  (45,  43)  (46,  44) } 


Set  of  associated  p a* ; 4  (as  Figure  2) 
/-first  image,  s-second  image 
Figure  5 


Cooperative  Continuity  Enforcement 

To  the  rescue  comes  a  depth  continuity  enforcement 
process  operating  in  a  cooperative  mode  upon  the  edge  pair¬ 
ings  assigned  by  the  correlation.  It  follows  connected  edges 
in  the  two  image  planes,  removing  those  edge  pairings  that 
it  finds  to  be  inconsistent.  An  inconsistent  pairing,  in  thi; 
sense,  is  one  whose  edges  are  nearest  connected  in  2-spa:e 
(as  seen  in  either  image)  to  edges  which  have  been  paired 
differently  by  the  correlation.  This  conflict  in  correlation 
is  a  necessary  condition  for  inconsistency,  but  is  not  alone 
sufficient.  For  each  pairing  (f^t  sj1)  on  scanline  m,  and 
associated  disparity  measures  7  and  a  are  kept  of  the 
mean  and  standard  deviation  of  changes  in  6(j  among  2- 
space  connected  pairings.  The  other  half  of  the  consistency 
criterion  is  that  the  change  in  disparity  |£J]  —  from  the 

pairing  (/ ^  ,  8j  )(di$parity  =  8 JJ)  to  its  nearest  connected 
2-space  neighbour  pair  (fry)  8nq)(die  parity  =  6*pq)  be  within 
I?  °t  ty-f"*7)-  A  single  such  conflict  is  not  enough  to  remove 
a  pairing  (as,  really,  which  pairing  is  in  error?).  Rather,  a 
pairing  is  only  removed  when  it  is  found  to  be  inconsistent 
from  2  different  sources. 


' 


172 


fl,  5 o.  fa  connected  -♦  (f\,  «J)  is  bad 

<o.  *  ii  fl  connected  -*  (/ 2,  «o)  'E  bad 


connectivity  without  consistent  correspondence 
Figure  6 

When  a  correlation  pairing  (/” ,  >?)  >E  removed,  no  im¬ 
mediate  attempt  is  made  to  reassign  the  ST  or  *?  (although 
it  will  be  in  the  near  future),  and  these  edges  are  bypassed 
in  the  2-space  connectivity  structure  ..  the  paired  edges 
above  and  below  them  are  joined  now  as  2-space  nearest 
connected  pairs.  The  new  change  in  disparity  is  evaluated, 
and  tested  to  see  whether  it  lies  in  the  |i)  —  o,  n  +  o\  inter¬ 
val.  When  no  further  edge  pairings  can  be  removed  by  this 
process,  all  pairings  left  having  a  single  inconsistency  are 
removed.  This  more  ruthless  removal  could  not  be  applied 
earlier,  as  it  would  delete  good  pairings  adjacent  to  bad 
ones  in  the  process  of  removing  the  bad  . 

Certainly,  a  great  deal  more  may  be  done  with  these 
edgeB  unpaired  by  the  removal  process  ...  2-D  connectivity 
may  make  it  clear  where  they  should  be  really  paired,  the 
reduced  combinatorics  (generally,  as  fewer  edges  are  left  un¬ 
paired  than  began  that  way)  may  facilitate  further  correla¬ 
tion  with  relaxed  constraints  (particularly,  allowing  edge 
reversals  -  a  left-right  ordering  in  one  image  matching  a 
right-left  ordering  in  the  other),  etc.  ...  effort  will  be  put 
in  this  direction  later,  once  the  correlation  process  itself  is 
felt  to  be  sufficiently  stable  and  successful. 


Coarse  to  Fine  Correlation 

A  concern  over  the  loss  of  potentially  useful  edge  cor¬ 
relations  through  the  combinatorial  reduction  then  lead 
to  a  further  change  in  the  approach.  In  it  contrast  based 
and  half-edge  based  correlation  approaches  are  combined. 
U  uses  a  resolution  reduced  ‘planning’  scheme  that  ex¬ 
ploits  the  coarse  image  structure  in  reducing  the  correla¬ 
tion  combinatorics.  Here,  the  images  are  repeatedly  halved 
in  resolution  with  a  1-2-2- 1  averaging  operator  (in  effect 
removing  the  high  frequency  components,  leaving  the  low- 
frequency,  coarser  structure  more  visible).  The  edges  found 
at  this  resolution  in  the  first  and  second  images  are  treated 
as  significant,  or  landmark  edges,  and  are  correlated  in  the 
contrast  based  edge  style.  For  each  set  of  assignments  of 
these  edges  having  implied  error  not  greater  than  that  of 
the  ‘best-so-far’,  the  edges  in  the  intervals  between  the 
landmarks  are  half- edge  correlated.  The  reasons  for  this  dual 
mode  of  operation  are  that;  1)  it  seemed  a  nice  way  to 
merge  the  two  (equally  valid  and  equally  incomplete)  edge 
matching  assumptions,  and  2)  it  was  felt  that  the  smoothed 
image  edges  would  be  more  contrast,  preserving  over  view 
point,  having  their  high  frequencies  removed  (this  is  a  valid 
assumption  for  narrow  to  medium  angle  stereo,  perhaps  not 
for  wide  angle). 


The  coopei  ative  algorithm  uied  b:rc 
differed  ilightly  from  that  deicribed, 
and  lome  jag  linei  are  ■  till  preient  in 
thii 


t 

/' ) 

-  n'n.  r 

'">»  >  V  1 

/<•  I  7 


'  t-  / 

1M:‘jnnnnii  r  •'"*  V„ 

ut  I  trnl.  t  f 

I  _ |>(i . 


/_ ...  ,p 
}  n-.TT 


'i 


n 2<?o  -  -  .  P 20  0  -  n6"- 

Correlation  results  (from  Figure  4) 
after  cooperative  connectivity  enforcement 
Figure  7 


V  /  ) 

\  I  Xre/- 

-  L  }  r'- 

\\&  V 

G  •  it  nt.  cJr‘  nui  i  £ 

Brcunri  f  1 i 

_ r  J?n.  . J ifl'T J i.* 


_  -m 


A 


— 


173 


one  quarter  resolution  edges 


7  A 


V  • .lV  ;  ' '  ’  ■' s\  »  '  Yt‘  i’<  ■  '  ^ 

i ;? '/£: 'iy-. v  .V'v  ■  ‘  'K v 

j*-/  ^Vb'  ,t  i.'.V :  •*.*y ,  ‘  \  v ;  a' >•  ["7.1.  <>sv  ; '(;■ .. '  7  !  •  .  v  * !  "•  { 

v;-1'  *. 


-  ;  '-A/.  -  \ ;  , 

^'7:^A,vo'.v%: Vv' 


'  .  />■•*  »;■*  ,{/  /  sV  ’•> 

v^SVv'  > 

r/<V 7 v,7  ,V> '  v '•  ;•  v;  i. 

m\'\  >&'■*. •/)  >A, 
h&fc&v&v;  /  A.y. v ; 


2  images’  coarse  depictions 
Figure  8 


Some  recent  results  with  the  Viterbi  algorithm  follow. 
The  principal  complaints  with  these  at  the  moment  are  the 
large  number  of  apparently  poor  correlations  (jag  linesjthey 
have,  and  the  lack  of  good  vertical  connectivity  in  the  image 
edge  depictions.  Our  efforts  to  solve  these  problems  are 
leading  us  to  consider  adding  other  local  constraints  for 
the  edge  matching,  keeping  alternative  selections  for  each 
edge  pairing,  and  examining  forms  of  interpolator  correla¬ 
tion  [4]  in  the  surviving  intervals  defined  by  the  correlation 
and  cooperative  consistency  enforcement  process.  We  have 
corifidence  that  these  will  significantly  improve  the  results. 
We  would  also  like  to  work  with  more  controlled  images 
(remember  the  requirement  that  the  epipolar  lines  here  be 
parallel  to  the  camera  baseline,  only  in  the  CDC  synthetic 
images  was  this  the  case),  and  our  recent  reinstallation  of 
a  pair  of  GE  CCD  cameras  will  help  in  this. 


References 

jlj  Arnold,  R.  David  ,  ‘Local  Context  in  Matching  Edges 
for  Stereo  Vision,’  Proc .  ARPA Image  Understanding 
Workshop ,  Cambridge,  Mass.  May  1978,  65-72. 


[2]  Forney,  G.  David  Jr,,  ‘The  Viterbi  Algorithm,’  Proc . 
IEEE \  Vol.  61,  No.  3,  March  1973, 


There  is  always  the  danger  with  hierarchically  con¬ 
strained  searches  such  as  this  that  the  best  solution  at  a 
coarse  level  (reduced  resolution)  will  not  be  consistent  with 
the  best  overall  solution  at  the  finest  level  (full  resolution), 
This  problem  was  avoided  in  the  branch&bound  correlation 
by  doing  the  full  resolution  interval  correlation  whenever 
the  reduced  resolution  correlation  suggested  an  improve¬ 
ment  was  possible  over  the  best  yet  achieved.  This  recursive 
approach  cannot  be  integrated  into  the  Viterbi  correlation 
as  it  stands,  and  this  is  one  aspect  of  the  Viterbi  algorithm 
that  we  are  looking  into.  Viterbi  does,  however,  have  some 
pretty  outstanding  advantages:  it  is  optimal  (when  using 
the  same  assumption  of  no  edge  reversals  in  the  images), 
polynomial  (as  opposed  to  exponential),  and  is  very  very 
fast. 


[3)  Henderson,  Robert  L.,  Walter  J.  Miller,  C.  B.  Grosch, 
‘Automatic  Stereo  Recognition  of  Man-Made  Tar¬ 
gets/  Soc.  Photo-OpficaJ  Instrumentation  Engineers , 
Vol,  186,  August  1979, 

|4]  Panton,  Dale  J,,  ‘A  Flexible  Approach  to  Digital  Stereo 
Mapping/  Photogrammetric  Engineering  and  Remote 
Sensing^  Vol.  44,  No,  12,  December  1978,  1499- 
1512, 


'’«*! AUK 


mHii9Hp#Mi 

vnnv-  jbl&'-vsk 

^  vV^$X '\  Nx\'  VX  ■  X  X  ii"  V^v 
s  ^  '*  «  X.  $  £$«$  XXVs# 

■•wj?  ^'SSfcW^"^' ^■.\s'^:"'  'V;^;-  '  "'''--  -'-  -'  "'  "•" 

t  t'i  C;i;l: 


' 

. x  Jill!! 

,-.  >\.;sV,M.;i| 

Sitf  ^ll 

X  • 

»•  ■  ■'•■<  ^xxVV  V>'^Xv###V  : 

•  xX.x'-:  fevs^cYSSPS^v 

,  :;\\-.>n\  ;V>.-^».--.%«  .w  -.XXv 

I^HMfci 

iPv  *mM 

iJk  ?\v,s  1|tt 

\  .'  ■'  \.  •>  ': .  X  X 

X  v  sN.  '  vv 

r.  ;  'i  $  '  X.  \  ;  x-  ^ 

■ 

X.  | 

'  ^ '  1 


111 

#vn\*;w 

,;X>xx\^rJ 


■ 


.  V$i^X  xj  . 

#%## 


VVs#XXV###  VVX<XX.<^V#  . ,'.•  :.>*3 

Synthesized  urban  images  (from  CDC) 


§&&&  aVXxX.Ns  ;-;Xv>&*n<&v«v.-.s 


with  lome  rtTtical  ghtchei 


All  edges 

including  laterally  inhibited  edgei 


Excluding  laterally  inhibited  edges 

laterally  inhibited  edgei  are  kept, 
but  not  correlated 


After  correlation 

Left  lide  of  edge  correlation*  tie  superimposed  with  right 
side  of  edge  correlations 


After  cooperative  process 

3000  half-edge  pairing!  (57%of  poiiible  edgei)  7  see. 
first  image,  12  lec.  second  image  plus  correlation. 
37%r«moT«d  by  cooporkti ▼•  process. 


Figure  9 


imagery 


AS  ■  *53* 

,pi  cSf^si  /i: 


After  Correlation 


After  cooperative  process 
5300  half-edge  pairingi  (65%of  ponible 
fint  image,  15  lec.  lecond  image  plui 
43%remored  by  cooperate  proceu. 


176 


CONVERGENCE  PROPERTIES  OF  TWO  LABEL  RELAXATION 


Arden  R.  Helland 


Westinghouse  Systems  Development  Division 
Baltimore,  Maryland  21203 


ABSTRACT 

This  paper  analyzes  the  convergence  properties 
of  the  Pc J  eg  relaxation  scheme  CO  for  the  one- 
dimensional,  two-label  case.  It  is  shown  that  if  a 
label  probability  is  identically  zero,  then  that 
label  probability  will  remain  zero.  However,  if 
non-zero,  then  the  label  probability  will  either 
Increase  toward  unity  or  decrease  toward  zero,  de¬ 
pending  on  whether  the  average  probability  of  the 
neighborhood  is  above  or  below  a  threshold  deter¬ 
mined  by  the  relationship  of  the  compatibility 
coefficients.  It  is  shown  that  the  speed  at  which, 
ambiguity  is  resolved  is  reduced  by  the  unlike  com¬ 
patibility  coefficients.  Speed  can  be  increased 
with  no  change  in  the  final  results  by  reducing  the 
unlike  coefficients  to  zero. 


INTRODUCTION 


as  well  as  their  labels.  This  paper  is  based  on 
tie  assumption  that  the  compatibility  coefficients 
are  independent  of  (or  averaged  over)  the  orienta¬ 
tion.  Therefore  it  is  a  function  only  of  the 
labels ,  so  that  for  labels  a  and  8: 

rij  (a, 8)  +  ra8 

For  labels  b  (black)  and  w  (white),  therefore,  we 
have  rbb,  rbw,  rww,  and  rwb.  rbo  and  rww  are 
called  the  alike  label  coefficients;  rbw  and  rwb 
are  the  unlike  label  coefficients. 

The  updating  rule  uses  the  following  inter¬ 
mediate  summations  where  K  is  the  iteration  index: 

K  K  S.  K  K  N  K 

Sb  =  Pib  Pjb  *  rbb  4-  Pib  Pjw  •  rbw 

K  K  N  K  K  N  K 

Sw  -  Piw  Pjw  *  rww  4-  Piw  £  Pjb  •  rwb 
J  J  ~  I 

These  are  used  to  generate  probabilities  for  the 
next  iteration  as  follows: 


Relaxation  labeling  has  been  shown  to  provide 
reduced  error  rates  in  pixel  classification  com¬ 
pared  to  procedures  that  are  based  solely  on  local 
evidence.  Relaxation  classification  uses  initial 
label  probabilities,  which  may  be  modified  based  or 
the  label  probabilities  in  the  neighborhood  (region 
of  each  decision  point  (pixel) . Cl , 23  The  objective 
for  the  cases  of  interest  is  to  segment  darker 
regions  from  lighter  regions  in  the  presence  of 
significant  noise  which  causes  ambiguity  within  the 
regions.  The  process  appears  to  have  the  in¬ 
triguing  capability  of  making  a  very  faint  target 
in  selected  windows  appear  obvious  after  a  few 
iterations. [3,43  It,  therefore,  appears  that  re¬ 
laxation  simultaneously  provides  the  advantages  of 
classification  by  regional  (vs  local)  evidence  as 
well  as  the  constrast  of  segmentation,  but  using 
the  principal  of  deferred  commitment  to  resolve 
ambiguity. 


THRESHOLD 


K4-1 

Pib 


Sb 


K  K 
Sb4-Sw 


and 


KP1 

Piw 


_ Sw 

K  K 
Sb4-Sw 


K  K 

Sb  and  Sw  may  be  reduced  to  two  variables  by 


making  the  following  substitutions: 


pa  s  i  iri 

N  J  =  1 


K 

Pja 


Pw  =  1  -  Pb 


Also,  for  simplicity,  the  iteration  index,  K,  will 
be  dropped;  where  K+l  occurs  it  will  be  indicated 
by/'  Therefore,  by  multiplying  by  the  number  of 
neighbors  N  and  combining  terms,  we  have 

Sb  «=  Pib  [Pb (rbb-rbw)  4-  rbw3  =  Pib  •  Fb 
Sw  =  Cl-Pib3  [ ( 1-Pb)  (rww-rwb)4-rwb3 
=  [1-Pib3  •  Fw 


The  foll°wing  analysis  is  based  on  the  pro 
bility  updating  rule  developed  by  Peleg  for  two 
labels.  The  two  labels  will  be  called  b  and  w, 
that  Pib  is  the  probability  that  the  ith  pixel 
black.  Likewise,  Pjb  is  the  probability  that  t 
jth  neighbor  is  black.  In  the  general  case,  th 
compatibility  coefficients  are  a  function  of  th 
orientation  of  the  jth  neighbor  to  th  ith  pixe 


Now  the  change  in  the  black  probability  of  the 
ith  pixel  for  each  iteration  is  defined  as  follows: 

A  Pib  =  Pib  -  Pib  =  — S£—  - 
Sb  4-  Sw 

=  Sb  -  Pib  (Sb+Sw) 

Sb  4-  Sw 

=  Pib  (Fb-Fw)  (1-Pihl 
Pib  •  Fb  +  (1-Pib)  Fw 


. 


177 


The  denominator  must  be  non- zero  for  all  normal 
cases;  the  only  condition  that  would  permit  both  Sb 
and  Sw  to  be  zero  is  if  Pib  and  Pb  have  absolutely 
certain  but  opposite  labels. 

Because  the  product  Pib  (1-Pib)  is  a  quadratic 
function,  it  is  zero  f  >r  either  Pib  =  0  or  Pib  -  1, 
and  positive  for  all  other  values  0  <  Pib  <  1 ,  with 
a  maximum  value  occurring  at  Pib  -  .5.  Therefore, 
APib  depends  on  the  "b  -  Fw  term  as  follows: 

Fb  -  Fw  >  0  =>  A  Pib  >  0 

Nov;,  Fb  and  Fw  are  dependent  only  on  Pb  and  the 
compatibility  coefficients,  so  that  Fb  -  Fw  0 
yields  the  following  result: 

a  rww  -  rbw  _  . 

Pb  >  - : - ;■  '  tt - r  =  Tb 

—  rww- rbw  +  rbb  rwb 

Therefore,  if  the  average  black  probability  of  the 
neighborhood  (Pb)  is  above  the  threshold  Tb ,  the 
black  probability  of  the  ith  center  pixel  will  be 
increased.  If  Pb  is  identically  equal  to  Tb,  Pib 
is  unchanged.  Of  course,  Tw  =  1  -  Tb. 


These  expressions  show  that  non-zerc  unlike  coef¬ 
ficients  directly  reduce  speed  of  convergence. 

RESULTS 

The  preceding  analysis  has  shown  that  the 
speed  of  convergence  toward  absolute  labels  and  the 
point  of  divergence  threshold  mav  be  derived  from 
the  compatibility  coefficients.  These  measures  of 
speed  and  tin  eshold  were  computed  for  some  of  the 
cases  of  coefficients  used  by  Rosenfeld  and  Smith 
in  their  paper  "Thresholding  Using  Relaxation",  in 
the  publication  by  the  University  of  Maryland  Com¬ 
puter  Vision  Labora tory . C4 □  The  principal  cases 
for  the  "dark  tank  picture"  are  reproduced  here  as 
Figures  1  through  5  (they  were  originally  Figures 
10,  6,  13,  11  and  12).  The  computed  measures  of 
speed  and  threshold  are  tabulated  as  follows: 

Figure  No .  J_  2  Jj  _4  _3 

Speed  Cd  .28  .38  .29  .33  .89 

Threshold  T1  .77  .74  .58  .50  .50 


Therefore,  we  may  conclude  the  following  re¬ 
garding  the  response  of  APib  to  Fib  and  Pb. 

1.  For  lighter  regions: 

Pib  <  1  and  Pb  <  Tb  =*>  Pib  <  Pib 

2.  For  darker  regions*.  ^ 

Pib  >  0  and  Pb  >  Tb  *>  Pib  >  Pib 

* 

3.  If  Pb  <  Tb  for  all  iterations,  Pib  -*■  0 

A  * 

4.  If  Pb  >  Tb  for  all  iterations,  Pib  +  1 

Therefore,  Pib  =  0  and  Pib  =  1  are  values  toward 
which  all  pixels  within  consistent  regions  will 
converge.  Although  Pb  may  change  from  one  itera¬ 
tion  to  the  next,  it  is  unlikely  to  change  its 
relationship  to  the  threshold  (except  if  it  is  very 
near  the  threshold  or  if  the  initial  label  proba¬ 
bilities  are  inconsistent). 

SPEED 


Inspection  of  Fb  and  Fw  shows  that  the  unlike 
coefficients  rbw  and  rwb  have  a  tendency  to  reduce 
APib. 


Now,  1--Pib  is  the  difference  between  Pib  and 
absolute  black  label  probability.  Therefore,  re¬ 
lative  speed  is  defined  as  the  ratio  of  APib  to  the 
remaining  difference;  this  cancels  the  1-Pib  term 
in  the  numerator. 

APib  Pib  (Fb-Fw) 

Therefore,  Cb  =  j  _  =  Plb  .  pTf  (l-Pib)Fw 

It  can  be  shown  that  Cb  is  maximum  with  respect  to 
Pib  and  Pb  when  both  approach  the  constraint  limit 
of  unity.  Therefore,  if  Cb  is  evaluated  at  Pib  = 

Pb  =  1: 


Cb 


Fb  -  Fw 
Fb 


rbb  -  rwb 
rbb 


Likewise,  Cw 


-APib 

Pib 


Fw  -  Fb  rww  -  rbw 
Fw  rww 


The  first  four  figures  are  given  in  order  of  de¬ 
creasing  light  threshold  (increasing  threshold  for 
the  dark  object).  Therefore,  the  size  of  the  ob¬ 
ject  should  tend  to  decrease  as  less  of  its  fuzzy 
boundary  is  classified  as  dark.  This  is.  fairly 
readily  observable  in  the  representation  for  both 
the  updated  probabilities  (displayed  ay  revised  gra 
level)  and  the  histogram  display  for  each  iteration 
Ideally,  the  threshold  region  should  develop  a 
"valley"  in  the  histogram  as  values  converge  tc  the 
two  extremes.  Although  this  tendency  does  o^cur, 
the  following  reasons  make  this  less  obvious; 

1.  Movement  near  the  threshold  region  ir  slow, 
so  that  separation  on  initial  iterations  is 
slow  until  ambiguity  is  resolved, 

2.  Unless  the  light  and  dark  thresholds  are 
both  0.50,  the  region  with  lower  threshold 
expands  into  its  fuzzy  boundary  region.  For 
light  thresholds  greater  than  .5  (figures 

1  through  3)  the  dark  object  expands,  so 
there  is  continued  motion  from  right  to 
left  through  the  threshold  region  of  the 
histogram. 

3.  The  histograms  are  normalized  to  the 
highest  value,  so  convergence  to  the  most 
common  label  tends  to  suppress  the  re¬ 
maining  portion  of  the  histogram. 

The  first  four  figures  show  varying  degrees  of 
speed,  all  relatively  slow.  Figure  2  has  the 
highest  speed  measure;  examination  shows  that  seg¬ 
mentation  of  the  image  is  more  complete  (and  the 
histogram  more  completely  converged  to  the  two 
extremes)  than  for  Figures  I,  3  or  4. 

Figures  4  and  5  have  the  same  threshold,  but 
offer  a  dramatic  illustration  of  the  difference  in 
speed  of  convergence.  Examination  shows  that  itrera 
tion  8  of  Figure  4  indicates  results  approximately 
comparable  to  iteration  2  of  Figure  5.  The  faster 
convergence  of  Figure  5  obtains  virtually  complete 
segmentation.  In  addition,  because  the  light  and 
dark  thresholds  are  equal,  the  boundary  >f  the 
object  is  stationary. 


This  is  aJso  indicated  bv  the  convergence  of  the 
histogram  to  the  two  extremes,  with  no  change  after 
segmentation  is  complete. 

figure  5  die  not  achieve  segmentation  of  the 
entire  object  because  the  light  threshold  was  so 
low  that  the  less  dark  regions  of  the  object  were 
laheled  ’’light".  Rosenfeld  and  Smith  define  a 
process  whereby  the  image  gray  values  may  be  trans¬ 
formed  to  adjust  the  probabilities  so  that  the 
desired  gray  value  results  in  the  probability 
threshold.  Of  course,  selection  of  the  desired 
gray  level  theshold  depends  on  the  type  of  imagery 
and  the  application  of  the  image  processing  results. 


CONCLUSIONS 


Boundary  stability  is  related  to  the  relaxation 
thresholds.  If  the  threshold  for  one  of  the  labels 
is  less  than  for  ofhers,  then  regions  with  that 
label  will  tend  to  expand.  However,  small  regions 
tend  to  shrink  due  to  the  effect  of  the  surroundings 
on  convex  boundaries.  This  appears  to  imply  that 
for  large  regions  of  interest,  most  shape  detail 
will  be  preserved  with  (nearly)  equal  thresholds 
for  each  label.  However,  the  response  to  small  re¬ 
gions  of  one  label  may  be  improved  by  a  moderate 
reduction  in  that  threshold.  The  expansion  due  to 
unequal  thresholds  may  be  used  to  counteract  the 
tendency  for  small  regions  to  shrink. 


The  analysis  of  speed  showed  that  the  change 
in  probability  of  the  center  pixel  is  reduced  by 
the  proportion  of  the  unlike  coefficient  to  the 
alike  coefficient  for  each  label.  Maximum  speed 
obtained  when  unlike  coefficients  are  zero.  Also 
it  is  shown  that  if  the  probability  for  the  cente 
pixel  is  zero  foi  either  label,  then  that  proba¬ 


is 

j 

r 


bility  will  not  change  as  a  result  of  further  re¬ 
laxation  iterations.  Therefore,  input  data  should 
avoid  zero  probability  to  permit  full  operation  of 
the  relaxation  functions. 

Vhen  the  average  probability  of  the  neighbor¬ 
hood  is  equal  to  (or  nearly  equal  to)  the  thresh¬ 
old.  then  the  probability  of  the  center  pixel  is 
unch ringed  (or  changed  only  slightly).  If  gray 
levels  near  the  probability  threshold  are  con¬ 
sidered  as  ambiguous  levels,  then  little  change 
occurs  until  the  ambiguity  becomes  resolved  by 
further  iterations.  In  other  words,  the  segmen¬ 
tation  process  inherent  in  relaxation  proceeds 
very  cautiously  as  long  as  the  neighborhood  label¬ 
ing  remains  ambiguous,  deferring  commitment  to  any 
label  until  information  from  more  distant  regions^ 
resolves  the  ambiguity.  However,  if  the  neighbor¬ 
hood  labeling  is  not  near  the  probability  threshold, 
then  the  label  probabilities  are  driven  toward  zero 
and  unity  at  a  rate  that  is  maximized  if  the  un¬ 
like  coefficients  are  zero. 


REFERENCES 

U  S*  Pele8>  A  New  Probalistj^Relaxation  Scheme 
TR-711,  Computer  Science  Center,  University  of 
Maryland,  November  1978. 

2.  A.  Rosenfeld,  R. A.  Hummel  &  S.W.  Zucker,  Scene 
Labeling  by  Relaxation  Operations.  IEEETSMC-6 

3.  A.  Rosenfeld,  A.  Danker  &  C.  Dyer,  Blob  Extrac- 

Jjy  Relaxation.  Proceedings:  Image  Under¬ 
standing  Workshop,  April  1979,  61-65. 

4.  A.  Rosenfeld  &  R.  Smith,  Thresholding  by  Relaxa- 
M°P >  Computer  Science  Center,  University  of 
Maryland . 


'  1  !  r- 


r;<- 


T  ’ 


T  ■ r  u: 


)  *'t 


trttortn" 


n  "L  -  r,  r ;  4  ■ 


!7 


'inwitmi  * 


iTWV  ■■  »  . 


;  !.  - 


182 


INVESTIGATION  OF  VLSI  TECHNOLOGIES  FOR  IMAGE  PROCESSING 

// 


WILLIAM  L.  EVERSOLE  AND  DALE  J.  MAYER* 


TEXAS  INSTRUMENTS  INCORPORATED 
P.O.  Box  222013,  M/S  3407 
Dallas,  Texas  75222 


// 


ABSTRACT 

This  paper  summarizes  recent  work  performed 
by  Texas  Instruments  Incorporated  for  Carnegie- 
Mellon  University  on  the  investigation  of  very 
large  scale  (VLSI)  implementations  for  image 
processing.  To  fully  exploit  VLSI  technologies, 
system  engineers  must  appreciate  the  constraints 
as  well  as  the  benefits  of  having  a  million 
transistors  on  a  single  chip.  Optimization 
of  the  architecture  of  an  image  processor  is 
imperative  to  ensure  a  broadly  applicable 
programmable  component.  Discussion  of  the 
architecture  trade-offs  and  the  hardware 
implementation  of  a  programmable  image  processor 
to  implement  a  sum  of  products  operator  is 
presented. 

INTRODUCTION 

To  take  advantage  of  rapid  advances  in 
integrated  circuit  technology,  the  hardware 
architecture  of  an  image  processor  must  be 
optimized  in  terms  of  execution  time,  size, 
power,  and  cost.  This  optimized  architecture 
will  allow  the  implementation  of  highly  complex 
image  processing  functions  on  a  monolithic 
suhstrate.  Examples  of  the  need  for  optimized 
architectures  are  image  processing  algorithms 
that  involve  array  multiplications  of  the  form 

1-1 

y  "  2  ai  Xi  CD 

i=0 


where  the  at's  represent  a  set  of  fixed  weighting 
coefficients  known  a  priori  and  the  x. 's 
represent  a  set  oi  a  sequence  of  input  values. 
Equation  (.1)  can  be  used  to  calculate  the 
coefficients  of  various  transforms  such  as 
Fourier,  Cosine,  Hadamard,  Haar,  and  others 
used  in  many  signal  processing  systems.  Where 
two  dimensional  transforms  are  needed,  successive 
one  dimensional  transforms  can  be  used  if  the 
transforms  are  separable. 


Many  image  processing  algorithms  require 
the  discrete  convolution  of  a  two  dimensional 
Input  image  with  a  two  dimensional  convolution 
array.  For  these  algorithms,  Equation  (l)  can 
be  represented  in  two  dimensions  by 

Ni  *2 

y(ra1,ra2)  =  x(nj  ,n2)A(ra1-n1+l  ,m2-n2+I)  (2) 


nl  »2 


where  yCmj.n^)  represents  the  discrete  convolu- 
tion  of  the  input  Nj  x  N2  image  array,  x,  and 
i.ne  convolution  array,  A.  These  mathematical 
operations  are  of^en  based  on  neighboring 
pixel  values  and  are  termed  neighborhood  oper¬ 
ators.  Examples  of  neighborhood  operators 
include  noise  smoothing  in  which  the  components 
of  A  are  of  low  pass  form,  edge  crispening  in 
which  the  components  of  A  arc  of  high  pass  form 
and  linear  edge  enhancement  in  which  several 
masks  are  convolved  with  the  original  image  to 
produce  maximum  output  for  differe.it  slope 
directions. 


Equation  (1)  can  be  implemented  using 
digital  multipliers  and  adders,  however  the 
size  and  power  required  to  perform  the  multipli¬ 
cations  at  video  data  rates  with  the  accuracy 
needed  for  most  image  processing  applications 
is  prohibited  from  a  hardware  standpoint.  a 
technique  for  realizing  equation  (1)  that  does 
not  require  digital  multiplication  is  the  dis¬ 
tributed  arithmetic  technique. *“3  Distributed 
arithmetic  allows  the  implementation  of  a  slid¬ 
ing  sum  of  products  (convolution)  of  an  input 
word  sequence  with  a  set  of  weighting  coeffi¬ 
cients  using  a  table  look-up  procedure.  Dis¬ 
tributed  arithmetic  techniques  can  also  be 
used  to  implement  a  nonsliding  sum  of  products 
of  an  input  word  set  with  a  set  of  weighting 
coefficients. 

This  paper  discusses  distributed  arithme¬ 
tic  and  ths  hardware  implementation  of  a  pro¬ 
grammable  sum  of  products  operator  based  on 
distributed  arithmetic. 


DISTRIBUTED  ARITHMETIC 


*Mr*  Mayer  s  now  with  Nox:thwest  Datacom 


The  operation  performed  by  the  distributed 
arithmetic  technique  is  the  convolution  defined 


The  word  stored 


in  this  loca t ion  is  given  by 


1-1 


(3) 


yn  “  23  ai  xn-i 

i»0 

where  xn_t  is  a  B-bi  t  word  represented  by 


B  -  1 


Y  x(n-i ) j  2J 


n~i  x(n-i  )  j  *■"  (4) 

j'°  x(n-i)j  ‘{O.lf 


Substituting  Equation  (4)  into  Equation  (3) 
yields 


B  -  1 


I  -  1 


ai  x(n-i ) j  2J  (5) 


J-0 


i-0 


Since  the  values  a<  are  fixed,  the  21  possible 
values  of  the  bracketed  term  of  Equation  (5) 
may  be  calculated  a  priori  and  stored  in  a  mem¬ 
ory.  For  each  j  of  the  outer  summation,  the 
value  of  the  bracketed  term  is  recalled  from  the 
memory  location  whose  address  is  formed  by  the 
1  nit  binary  word 


Z  = 


[x  (n)j,x(n-l)j>x(n-2)j>  •••*(n-I+l)j]  (6) 


C(Z)  = 


‘{0,1} 


(7) 


i-0 


These  values  recalled  from  memory  are  weighted 
by  the  factor  2J  and  summed  over  j. 


Tradeoffs  between  memorv  size,  number  of 
adders  and  the  number  of  table  look-ups  were 
discussed  at  a  previous  workshop.4 


ARCHITECTURES 


such  «  <7  "t8e  processing  applications 

such  is  video  bandwidth  reduction,  forward- 
looking  infrared  (FUR)  autocueing,  target 

rith  hCata°n‘  and  lma8e  ull<lers landing,  algo¬ 
rithms  based  on  Equation  (1)  are  used  wh)  h 

operate  on  fixed  8  by  1  blocks  or  sliding  3  by 
3  blocks  of  pixels.  Real-time  video  data  rates 
of  10  mega-samples/ second  and  6-  to  8-bit 
words  are  desired. 


The  distributed  arithmetic  technique  can 

bothSf  7  *mpleme,'t  these  algorithms  with 
th  fixed  and  sliding  blocks  of  data.  Figure 
1  shows  a  block  diagram  of  a  distributed  arfth- 
metic  implementation  capable  of  operating  on  9 
by  1  or  3  by  3  blocks  of  pixels.  The  6-bit 
input  words  (A,  B,  and  C)  are  sampled  at  10 

utchef  aCanrh  l0aded  int0  three  addressable 
latches  as  three  words  in  parallel  or  as  three 


INPUTS  < 


SHIFT 

CLOCK 


K0KIK2 

latch 

ADDRESS 


load  from 
latches 


outputs 


Figure  1 . 


Distributed  Arithmetic 


Implementation  of  a  Prograirmable  Sum  of  Products  Operator 


memory  blocklng  and 
■  freq,lency  can  b 


MCMCfty 


RAC  Implementation  of  a 

*  Pr°9rammable  Sum  of  Product,  Oper 


nS  Pipelining  tochnfn  elSnt  adders.  \ 

ave  a  maximum  of  100  »  to  lTS^Tzull 

<**>  <"*  "^ccnuu 

operate  on  sliding  bio  ks  ^  the  nbiHty  t 

R  ,S  (internjl  clock  ra e  V  *?  ■  'edict! 

N ,  C  implementation  is  A  block  diagram  of 
Nine  addressable  lateh  J°Wn  in  Figure  2 

BU‘ »' «!« 

"£-‘~  .'Xv;.?  r^HTr  A: 

lal  registers  convert  .  The  Parallel-t0- 

al  form  for  add— 4ac0f  tr^:0  bu- 

tw  inputs  (A,\  and  C)  1  Plxel  blocks,  the 

the  "in''  Sample  Peri°ds  ire^ren'T^  to8ethe, 
nine  addressable  latches  ^"lred  to  load 


INPUTS 

A 


l-ATCH  ADDRESS 


«"  WdTti:“^i  "!«  abb  three  latche* 
tera  (P/S)  convert  each  bit  m  Shlft  reSi- 
w°rd  into  a  bit-serial  t  'Parallel  input 
■sequentially  through  three^si  WfUCh  is  clocked 
registers.  The  outputs of  *~SCa8e  shift 

«ch  shilt  register  Pf0rm  tL  5ehfftinal  Sta«e  °f 

address.  cne  y bit  memory 


three  Inputs  (A,  Bf  and  C)  1  PiXCl  blocks.  * 
her.  The  three  input  lat  i  are  connected  togei 
tially  and  the  pa  railed -to-/  are,  loaded  aequet 
are  loaded  every  third  ,  °7erlaI  registers 
tb*  -Wft  registers  ?  '  Period>‘  therefore 
must  operate  at  twice  t >T°rV  ’  3nd  accumulate 
20  MHz.  lce  the  sample  rate,  i.e. 


hlochs,  t he  three  input  ^T*  3  by  3 

parallel  at  the  sample  rat!  are  loaded  i 

6~bit  Shift  registers  and’  therefore,  th 

must  operate  at  six  time^tl  ry’  3nd  accumnlato 
60  MHz.  times  the  sample  rate,  i.e. 


OUTRUrg 


operating  on  3  by  3  pixel  blocks,  three 
eriods  are  required  to  load  the  nine 
ble  latches.  Assuming  continuous 
n,  this  requires  the  parallel-to-ser ial 
s  to  address  the  memory  six  times  i  , 
i-e.,  2CKMz  internal  clock  rate, 
ng  techniques  can  be  used  to  give  the 
to-serial  registers,  memory,  and 
tor  each  a  maximum  of  50  as  to 


Although  a  sliding  sum 
directly  performed  with  a 
three  RAC  processors  can  be 
phase  clocking  to  perform  c 
RAC  equivalent  of  a  distrifc 
processor  is  three  times  1, 
required  internal  clock  ra 
f  ic tor  of  three. 


64  x  t  t  ai 
memory 

NO.  o 


64  X  I!  BITS 
MEMORY 
NO.  I 


I  NPUTS 


OUTPUTS 


SHIP  r 

ACCL.'hUj 
!  ATC 


OUTPUT 

LATCH 


fil*  X  r  1  Hll  TS 
HFMORV 


MC  rmpilemeintatiion  of  Programmable  Sum  of  Products  Operator 
WUh  Memory  Blocking  and  Multip,e-Bit  Addressing  P 


86 


f  l>~cm  I  iJ  M  latch 


**  K  Hi 
MtHDffY 
*+■  ■'  it 


*0***2* 3* 4  LOAD  CLOCK 

LATCH  ADDRESS 


Max fmum^Memory ^B^ock ing "and aMax1mumSMu1t[pl)e-81t ^Addressing  ^ 

'iDDlpniPnkaH/-,, _ i  ^ 


Each  RAC  implementation  produces  t-h^ 
output  for  nngiH,m  /  ^  produces  the  same 

full  precision  ^  °nly)  lnpUtS  lf 

as  indicated  in  Figures 3 h*?  °peratlon 
the  code  stored  in  LTorv  far  8  ,  H°Wever> 
different  Th*>  ,  ry  f°r  eac  1  aPProach  is 

t*"t-  ™e  w^rd  corresponding  to  each 

by  bit  add— •  Z.  for  Figure  2  is  given 


b"  Ff8“r*  4’  lce  W°rd  corresponding  to  each 
b  bit  address  of  memory  q  is 


C(q.Z)  *  £  aq  Zm  2"1  Zm 


C(Z>  "  2  ai  zi  Zi  e|0,  ij  (8 

ofh*.«ryd  .rsT"8  to  •“h 

C(q’Z)"  Za3q+i[Z2i+2Z2i+1]  (9) 

i“0 

Z2i.Z2i+i  '{0,1} 


tlons^f81^11^  V**  accumulator  and  adder  func- 
,‘1“  utha  various  architectures,  the  code 
In  ROM  should  be  stnr^  _  ,  *  coae 

form#  stored  in  two  s-complement 

. ,  ^  the  architectures  shown  in  Figures  1 

through  3,  the  input  words  (A  r  *  a  r\ 

restricted  to  positive  values  To  , 
n eaaMvo  _  -  values.  T0  accommodate 

form  a\  vT  V  68  ln  two 's-complement 
function  S,ubtract^on  option  in  the  accumulator 
function  is  required  for  Figures  1  and  2,  The 

theVbir^aa  fr0”  ^  Sl8n  blt  address  (i.e,, 
the  9-bit  address  obtained  from  the  sign  bits 

in  registers  R0-Rg)  ls  subtracted  in  the 


.  t 
.  r-  ■ 


otherUtHa°r”i  F°r  lnput  value  representation 
co™  1  "  ,  ?  complement ,  the  architecture 

complexity  is  further  increased.  Figure  3  can- 
no.  readily  accommodate  negative  inputs  due  to 
tue  multiple-bit  addressing.  i.e.,  tie  sign 
bits  are  not  distinguishable  from  the  other  bits. 

The  architecture  of  Figure  4  accepts  nega- 
ive  inputs  using  any  representation,  i.e. 
two  s  complement,  sign  magnitude,  oifset  binar^, 
etc.  This  Is  due  to  the  fact  that  with  maximum 
memo'-y  blocking  and  maximum  muitiplc-bit  ad¬ 
dressing,  the  architecture  of  Figure  4  looks 
op  each  complete  product  term,  alXi,  of  Equa¬ 
ls0”  1  rather  than  a  partial  product.  Equation 

must  be  modified  to  accommodate  the  chosen 
number  representation. 

Another  advantage  of  the  architecture  of 
£>ure  Is  that  the  function  stored  in  each 
memory  may  be  any  linear  or  nonlinear  function 
of  the  input  signal,  Xl ,  i.e.,  the  values 

stored  are  not  limited  to  the  linear  product 
ot  xt  with  a  weighting  coefficient  a, .  For 
example,  the  values  stored  in  each  memory  may 
represent  the  square  of  Xi  and,  thus  the  archi¬ 
tecture  of  Figure  4  implements  the  sum  of 
squares.  Other  nonlinear  functions  of  the 
xi  s;  such  as  polynomials,  trigonometric  func¬ 
tions,  exponential  functions,  magnitude  func- 
Lions,  etc.,  may  be  implemented  in  each  memory. 
This  allows  nonlinear  masking  opeations,  such 
as  oganthmic  Laplacian  edge  detection  to  be 
performed. 

HARDWARE  IMPLEMENTATION 

To  fully  explore  the  architects  al  impli¬ 
cations  of  a  programmable  sum  of  products 
operator,  a  hardware  Implementation  of  Equation 
U;  has  been  designed  ana  fabricated.  The 
readboard  is  capable  of  operating  on  sliding 

f!°CaSu?f  ,data  3S  illustratec;  in  Figure  1  and 
fixed  blocks  of  data  as  illustrated  in  Figure 
~j  A  bloc*  diagram  of  the  hardware  implementa¬ 
tion  Is  shown  in  Figure  5.  The  breadboard 
consists  of  nine  input  latches,  nine  parallel 
n-serial  out  shift  registers,  a  fast  512x12 
bit  memory  for  temporary  storage  of  the  partial 
produc  s,  an  EPROM  for  permanent  storage  of  the 
partial  products,  shift  and  accumulate  circui¬ 
try,  tr  -state  output  latches,  and  control 
circuitry. 

The  input  atch  structure  is  hardware 
or  software  selectable  for  serial  data  entrv 
a‘  INPUT  c  or  parallel  data  entry  at  DATA 

r  Pu  A,  DATA  INPUT  B,  and  DATA  INPUT  c.  This 
facilitates  the  implementation  of  a  9  point 
transversal  filter  or  a  3  x  3  sliding  window 
opera  or,  respectively.  The  input  data  word 
ength  is  hardware  selectable  from  1  bit  to  8 

as  twJ  is  hardware  or  software  selectable 
as  two  s  complement  or  magnitude  format. 

a  SetTnef  Wel8h“1n8  ^efficients,  a,  determine 

!  s?9  f.,Paraal  Products  which  are  stored  in 
a  M2  x  12  bit  high  speed  random  access  memory. 

The  data  to  be  stored  in  each  memory  location 


is  given  by  Equation  (7).  These  partial  pro¬ 
ducts  nay  be  down  loaded  on  the  data  bus  from 

The'nart?3,  b>'  “  controlling  processor. 

The  parti.l  products  are  hardware  selectable  as 
two  s  complement  or  magnitude  format.  The 
partial  products  obtained  from  the  RAM  during 

ln  ■'  '■"■ry-savo  i  .leur*,  U- 
tjr  with  data  shifting  to  -tight  the  signjfi- 

T' ftrodutl*  iiutpnL  Hut  a 
latcJ!  hy  J  tr‘«*c*  rnitput 

The  data  input  section  consists  of  9  latcn- 
es  connected  to  form  a  9  stag,  long  by  8  biL 
wide  shift  register.  The  outputs  of"*,  stage 
are  also  connected  to  9  parallel  to  serial 
conversion  registers  which  form  the  data  for 
the  RAC  operation.  The  data  input  operation 
is  controlled  by  the  INPUT  CONTROLLER.  The 

eHcD  haS  3  hardware  or  s»ftware  seleet- 

INPUT  Riff  H°Unt  as  °ne  0f  ““  inputs .  The 
PIT  RESET  line  is  pulsed  low  to  Initial- 

i^e  the  controller.  The  INPUT  STROBE  Un. 
shifts  data  through  the  input  latches  and  clocks 
e  controller  at  its  leading  edge.  After  a 
sufficient  number  of  strobe  pulses  occur  the 
INPUT  CONTROLLER  generates  a  LOAD  pulse  which 

“t-  5f ,  ■"«  laput  latches  “w 

the  parallel  to  serial  registers  and  starts  the 

in  An  Speed  asynctiron°us  RAC  CONTROLLER.  The 
LOAD  pulse  also  resets  the  INPUT  CONTROLLER  so 

latchlT  wd1ta  ean  be  shlftod  into  the  Input 
atches  while  the  RAC  operation  Is  taking 

required ^  SeC°nd  1NPUT  KESET  pulse  ls  noL 

The  RAC  CONTROLLER  operates  from  an 
asynchronous  internal  16.7  MHz  oscillator  (60 
^.sc  period).  This  ii  the  maximum  clock  rate 
for  the  components  selected  for  the  shift 
registers  and  accumulator.  After  a  LOAD  pulse 

CONTROUFgd  fr°m  tKe  INi'UT  CONTROLLER,  the  RAC 
CCNTROI LLER  sequences  the  operations  of  the 
PARALLEL  to  SERIAL  REGISTERS,  the  PARTI AI 
PRODUCT  MEMORY,  the  SHIFT  AND  ACCUMULATE " 
function,  and  the  OUTPUT  LATCH.  It  also  nrn- 
vides  signals  which  may  be  monitored  by  an 
external  processor  if  desired.  After  the  final 
INPUT  STROBE  pel.,,  „.ul,  ‘J 

calculation  is  available  at  the  TR1-STATF 
OUTPUT  LATCH  in  [Bx  +  6  I/J]  internal  clock 

cycles.  The  additional  6  1/2  clock  cycles  is 
due  to  the  pipelined  architecture. 


LRAC  t^O  x  Bx  +  390]nsec  (11) 

where  tgAC  is  the  time  required  to  process  the 
data  and  Bx  is  the  number  of  significant  bits 

;  «an  input  dTa,ta  word-  8  bU  data,  this 

is  870  nsec.  The  RAC  CONTROLLER  will  bring 
the  READY  line  high  when  the  output  is  aSaU- 
able  and  will  not  allow  another  result  to 
overwrite  the  output  latch  data  until  the  READ 
ACK  Input  ls  pulled  high  by  the  controlling  pro¬ 
cessor.  The  OUTPUT  BUFFER  FULL  line  goes 
high  the  INPUT  STROBE  line  is  inhibited,  and 
the  internal  oscillator  is  stopped  if  an  over- 


V 


C«MC«»T 
D*1»  «US 
W®T» 

Hi  Know 


KW*  ttllCt  711 


Fiqure  5.  Programmable  Sum  of  Products  Unit 


write  condition,  exists.  The  tristate  outputs 
are  enabled  by  pulling  the  external  OUTPUT 
ENABLE  line  low. 

d»~  °Ue  t0  t,le  P1?®11-11®  organization  of  the 
KAl  hardware,  the  PARALLEL/SERIAL  REGISTERS 
may  be  loaded  as  soon  as  Bx  bits  of  data  have 
been  shifted  out  of  them,  i.e.,  Bx  internal 
clock  cycles  of  60  x  Bx  nanoseconds  after  the 
last  INPUT  STROBE  pulse.  For  8  bit  data,  this 
is  48(i  nsec  which  gives  a  throughput  rate  A 
about  2  MHz.  If  the  INPUT  CONTROLLER  generates 
a  LOAD  pulse  before  the  PAF  ALLEL/SERIAL 
REGISTERS  are  emptied,  the  INPUT  BUFFER  FULL 
line  wlxl  go  high  and  the  INPUT  STROBE  line 
will  be  lr.niblted  until  the  registers  are  emp¬ 
tied  in  order  to  prevent  overwriting  any  data. 

from  the  above  discussion  it  can  be  seen 
that  the  maximum  Input  data  rate  depends  on 
the  number  of  bits  in  the  input  data  words  (B*) 
and  the  number  of  INPUT  STROBE  cycles.  The 
number  of  INPUT  STROBE  pulses  (N)  between  par- 
allel  to  serial  conversions  determine  the  ty« e 
of  operations  performed.  For  sliding  3x3  or 
9x1  filter  applications  only  one  strobe  pulse 
is  needed  between  parallel  to  serial  conver¬ 
sions.  For  non  sliding  3x3  window  operation 
three  strobe  pulses  are  needed  and  for  an  9x1 
transform,  nine  strobe  pulses  are  needed.  The 
maximum  input  data  rate  is  given  by 

^MAX^  “  (N)/(BX  x  60  nsec)  (12) 

For  sliding  window  or  transversal  filter 
applications  with  8  bit  data;  fMAvTN  =  2 
11Hz.  For  applications  such  as  an  8  x  1  trans¬ 
form  with  8  bit  data;  fMAX1N  =  16  MHz. 

The  maximum  output  data  rate  As  always 
given  by  3 


fMAXQUT  3  l/(Bx  x  60  nsec) 


The  breadboard  also  has  the  capability  to 
operate  in  parallel  with  other  breadboards 
in  order  to  provide  throughput  at  real  time  (TV) 
data  rates.  Three  control  lines  shown  dashed 
in  Figure  5  are  provided  to  synchronize  this 
operation.  These  lines,  TV  RESET  OUTPUT  TV 
STROBE,  and  CLEAR  OUTPUT  ENABLE  are  not  normal¬ 
ly  connected  when  operating  a  single  breadboard. 

71.  Th®  breadboard  is  12"  x  12"  x  5",  weighs 
/lbs  and  dissipates  15  watts. 

CONCLUSIONS 

Architectures  for  performing  array  multi¬ 
plication  without  multipliers  have  been  dis¬ 
cussed.  Tradeoffs  between  memory  size,  proces- 
sor  speed  and  flexibility  were  made  and  a 

f  ™  SCUSSl°n  °f  3  hardwar®  implementation 
of  a  ROM  accumulator  processor  was  presented. 
This  hardware  consisted  of  standard  off  the 
s  e  components  requiring  more  power  ana  size 


than  an  integrated  circuit  version,  ho /ever 
the  breadooard  is  an  invaluable  aid  in  the  defi¬ 
nition  of  a  LSI/VLSI  implementation.  Tie 
breadboard  version  allows  evaluation  jf  the 
algorithm  as  well  as  the  discovery  and  evalua¬ 
tion  of  the  problems,  risks  and  opt.  ns. 

Recent  advances  in  component  technology  now 
make  possible  the  realization  of  digital  pro¬ 
cessors  capable  of  performing  array  multiplica¬ 
tions,  however  an  understanding  of  architectur¬ 
al  tradeoffs  that  may  affect  algorithm  and 
hardware  performance  is  necessary  before  actual 
integrated  circuit  designs  begias  to  prevent  a 
poorly  defined  function. 

Under  a  parallel  contract  to  the  Air  Force 
Avionics  Laboratory  at  Wright-Patterson  Air 
Force  Base,  Texas  Instruments  is  designing  a 
programmable  image  processing  element  based 
on  distributed  arithmetic  using  n-channel  metal 
oxide  semiconductor  technology. 


REFERENCES 


Burrus,  C.S.  ’‘Digital  Filter  Structures 
Described  By  Distributed  Arithmetic," 
iEEE  Transactions  on  Circuits  and  Systems, 
Vol,  CAS-24,  (December  1977)  p  674. 

DeMan,  H.J.,  Vandenbulcke ,  C.J.,  and  Van 
Cappellen,  M.M.,  "High  Speed  NMOS  Circuits 
for  ROM-Accumulator  and  Multiplier  Type 
Digital  Filters,"  IEEE  Journal  Solid-State 
Circuits,  Vol.  SC-13,  (October  1978)  p  565. 

Classen,  T.A.C.,  Mecklenbrauker ,  W.F.G. 
and  Peek,  J.B.H.,  "Some  Considerations ’or 
the  Implementation  of  Digital  Systems  for 
Signal  Processing, "  Phillips  Research 
Reports,  Vol.  30,  (1975)  p  73. 

Eversole,  W.L.,  Mayer,  D.J.,  Frazee,  F.B., 
and  Cheek ,  T.F. ,  "Investigation  of  VLSI 

!f,A^°,l08ieS  for  lma8e  Processing”,  Proc. 
IMAGE  UNDERSTANDING  WORKSHOP,  Menlo  Park 
California,  April  1979,  p  159. 


190 


!  i 


APPLICATION’  of  LSI  AND  VLSI 


TO  IMAGE  UNDERSTANDING  ARCHITECTURES 
S.D.  Fouse,  G.R.  Nudd,  and  V.S.  Wong 


3011  Malibu^a^";  J-J“«ories 

nyon  Road,  Malibu  California  90265 


abstpact 


Hughes 6Res ear c^L ibor  ^ .  WD1  k  .undertaken  at 
in  support  of  Lhu  DARPA  ^leS  ^Mallbu’  California) 
program.  This  re[  ort  coverfethnUerStandin8  (IU) 
ober  1  979  through  April  1  gsn  ^  PPriod  from  Oct¬ 
ants  a  transition  pe^ioi  for  ^  SUCh’  repre’ 

to  this  period  was  ron  .  US'  0ur  wcrk  prior 

tion  and  demonstration  oTla  "“s  T 
(LSI)  microelectronic  technnln  ^ale  integration 
standing.  The  princinai  •  8y  por  image  under¬ 
go  design,  fabrication  fTa  thlS  WOrk  were 
high-speed  ’prlS^J0 fVrfa °f 
with  low-level  operators  rhl  ,  processing 
ing  in  that  we  are  complying  a  perfo™  C°nti,1U- 
uation  of  the  14  i  ,  8  Performance  eval- 

implements  on  the  pr^rL  alreadV 

with  other  groups  in  the  ^  ^  lnteracting 
MIT,  and  Stanford)  to  d  Pr°?ram  (including  USC, 
Perform  in  a  full-scale  jf™1"6  how  these  might 
issue ,  cne  that  1,  u  Systera-  A  second 
•I*  progr«  “  th=  -«"■««»  of 

appllcabflicy  .  ”f  «" 

(VLSI,  >50,000  aateJi  ?  ale  lnte8ration 

ators.  our  aims  and ^ogres's  fa^hi"1^1  °Per" 
described  below.  F  8  n  this  area  are 


Introduc'd  or 


heen  has 

formance  investigation  of  the  LSI ?1  ^ *  •  the  Per~ 
nped  for  the  low-level  on  *  LSI  clrcuits  devel- 
gation  and  analysis  of  T  '  imesti~ 

intermediate  a5  higher- Lve^  ""1  °f  VLS1  for 
completing  a  dnt-n-t  1  j  operators.  We  are 

op«r..o;"Ea*«"p*‘dIe  0_P'Lf»"-“=  ™»1~  of  th= 
characterize  the  imn  *-  t6  °n  the  pr°gra/T1  to 
ing  throughput,  accuMcy^a^T1"8  ^SSues’  irclud- 
is  partially  in  response’tn '^  dynamic  range.  This 
developed  within  thl  1  the  interest  that  has 
circuits.  This  will  ln  applyinB  these 

mine  tyw  they  might  in^  Sf  Gnable  us  t0  deter- 
machine  for  the  high  With  either  a  host 

more  comprehensiveVsr  *  °f  °pe5ate  as  part  of 


**'*B1*  level  or 

s’sssL."a?srm»i 

and  have  supplied  sample  circuits  j  Stanford 

central  theme  "of  ^ur C  WhfCh  haS  now  become  the 
tion  of  VLSI  to  implement  the^v-^vef?  aPPUca‘ 


intermediate,  and  high-level  n™ 
an  effective  VLSI  architect  protessing-  Before 
it  is  necessary  to  obtain  a  T  ^  confi8uted, 
specification  of  candidate  COmplete  performance 

With  an  analysis  t0g£>ther 

Of  throughput,  data  bandwidth  f°rithms  in  Lerms 
To  this  end,  we  h  vV  ’  l0Cal  sroi*age,  etc. 

processor,  and  a  segmental  ^  finder>  texture 

systems.  As  a  base-line*  Scheme  as  candidate 
rithms  developed  by  Nevata  1  itl*  2**®”  the  alg0“ 
Price3  for  each  Sv^Pm  T  Laws  ’  and  Olander/ 
investieatino  Y  m’  In  addition,  we  are 

“rSCfo"'"  r1  *P0h  .3 

VLSI  and  may  allow  inciSap!?*?,,*®  WeU  sultable  to 

venrfonal  bLry  LSh"“ £  '£“«£“'  »«  ~n- 

B""  topic  are  gl.m  be"“  ’  Pr- 

2-  Performance  Evaluation  o 


Test  ChiD  111 

we  have  workshops/ 

aimed  specifically  at  r  ,°f.  U  LSI  Ptimitives 
the  low-level  operators'  iraplementation  of 

tive  throughput  are  listed  in  Tab S  f**1' 


Table  1.  CCD/M0S  Circuits  Developed 
on  the  iu  Program 


i” 

“T“ - — - - - 

III 

^1-rUJw 

i  ■■  hJ  w  Plj 

Ht* 

^Kr  44Eie±kflD 

1  X  , 

III r*r 

I 

|  UwltLUn. 

1  1  4  1 

11  Wifrrtbtw  26W  r«tljr 

S  T  J 

El 

J*  i  j 

I 

Jtapn 

i  #  J  j 

I 

W  J  LJ  n| 

| 

...  1 

ill 

L*piicfc*n 

3  M  } 

1(1  Bi  ?r5|Tjata«hJ  t 

1  ahvn-L-j[  J  r,fl 

c^nvulg  -iun 

J  H  7 

i  *  ! 

1 

"Piui1  Jh^pid  a^-dl jr. 

— - 

iipaJjr 

Jfi  n|  1 

191 


inPutT°o^r:^:lzith^ wofk  and  »««*- 

formance  analysis  of  taoh  of ^  co"tlnuing  a  per- 
ed.  Particular  emphasis  is  h  ^  deveJoP- 

ChiP  hi,  which  hll f f  7  "8  glven  to  Test 

cvcF  rUch  has  flve  functions  including  a 

5x3  programmable  convolution  a  7x7  ^ 
mable  convolution  curtentlv  imnil  ?  !  pro8ram- 

detector  a  Ixi  t  a  i  u  ^  ®P lamented  as  an  edge 

sort  for  median  fiSri  "  OPeJ3t0r'  3  5  —  t  8 
convolution  PreHmi  ^’  ""  3  26x26  blPolar 

the  sort  circuit  cn,,v?  W3S  Pro8rammable , 

rsasL’ssrss  *rr»  ~ :-”™ 

tigating  the  fallowing  topics-  dvn^r"^  lnVeS" 
programmable  weights  on  the  convolutions  ^Un  °f 
range^of f  the  f-ctions  ovOr  t^  fuU 

srjsr perfonnance  evaiu3- 

median  operator  and  th  \  ynaroic  range  of  the 
device.  The  rOc  ??  ^  maxim.m.  speed  of  each 

available  to  the  community  b<?  m3de 

3.  Development  of  Intermediate-Level  vtct  t 
Understanding  System  VLSI  Image 

an  inves tigation'of °t  ie  S  phase  of  program  is 
VLSI  tpr  li  Vf  6  lmpact  and  benefits  of 

L“  ir“  «* 

proH..  “d  «ter.L,«li„, 

cial  machinery  will  he  n  th  *  6  form  of  sPe‘ 

the  throughput  and the  ^7°*^  t0  3ChleVe  both 
relatively  unsophisticated  lysSfwITr  T  7™ 
capability.  The  advent  of  VLSI  L  L  wOh  " 

o"  2 :r„i*2rnlc  p~. .. 

Lms.  Hoover  7  ^  Prob~ 

the  basic  OL^ptLOs  usL^O  nClPated  that  m3ny  °f 
generations  of  processors  3nd  Prevl°us 

era  of  VLSI  technology.  A  signified VaUd  7  the 

necessary^gat^count  P3r^°Un  ”  C  °tf®duce^the^ 

bility  ^Thir1111'6^  C°St  ^^e^reUa- 
y*  This  constraint  may  well  be  -t 

wbere  6he  83be 

made  that  the  d^f  /  fP>  A  good  case  has  heen 

may  be  the  SLi  i  “:t  hPUl:tl0n  3nd  ^erconnects 

where  8ates  ar^  eSSen^  ^  T  in  fUture  ^ines 

>W2lo™  “u  n\  ;:‘  J*  Vl“  “« 

design  effort.  ’  delay  tlme3>  and 

m  PiS2l“  uXr:fCl  2  Pr°bl“  *• 

rlfKme  6  i  1  involves  determining  the  alco- 

X“vtnLior  r 

potential  conflicts "an^bott^nec^  ^^Ws  ** 

and  ce  “d*  ‘°  lnCOrp°-^- 
F  d±±eiism  and  concurrency.  With  the 


»■  s-.f‘.^,Tc.si.*21*1r;sh « 1 *• 

srst“ig-- 

5«S^‘££CS.T“i 

£.»  dot.  fio»  2.  b‘«t“h2v",Tr%"2,;,‘  r* 

issss,  T^rkrS^~ 
54 srsa.’^srs*  r r- »“  b*‘X‘  “ 


064  )  - 1 


FiSure  1  Program  Plan  for  Development  of 
Intermediate  Level  VLSI  Image 
Understanding  System 

nnco„2nL“o2r.2hMt22rrt‘ch  d"*l0‘’1"* 

«re  simulation  c.p.blUty  di,;2“:<,*b“lo*„!'"d- 


A. 


Systems^  GfaPh  Analys's  of  Candidate 


processing  ZllZuTe^  ^ 

Swe8e„trttteCtUre8  th3t  U“11Ze  c omme na lit  d eV e  1_ 
line  Hndt  Systems-  Tha  systems  chosen  were  a 

£  222rho*„T^^“:  ;»- 

chosen:  -ltled.  the  following  algorithms  were 

(1)  Nevatla  line  finder3 

o’  *y,t-4 


;nh  (S ,  s, 

M  NATION 


Once  the  graphs  are  cc 

?  ^tie  the  potential  parall 
nthms  as  well  as  ,h« 

„  1  as  tne  conmo 

systems.  For  example,  for 

is  apparent  that  the  pro 

convol  through  to  g 

!c  d  pe"deM  slS™l  Paths, 
al  Partition  would  be  to  , 
tho  normalizaf ion,  and  the 
measure  on  a  single  ch 

hips  sTm,awe  need  « 
is  that  tb  adVant-^  of  this 
■  hat  the  communication  to 

n  mrzed.  Several  similari 

parallel  *  °bVl°US  ^  U,e 
Parallel  convolutions  in  botl 

:* ,“d  ••*«.  .w«! 

ntj  1S  between  the  edge-li 
line  finder  and  the  connected 
segmentation  system.  Both  ft 

ci  her Vvking  data  Points  i 
C1  her  a  line  or  a  revinn 


Nevatia  Line 


INPUT  DATA  RATE 

n2fpixe  i/sec -nxn  serially 

F  IMAGES/SEC 

n2f  bits/sec 


M  IMAGE 


SCANNED  IMAGE 


features 


REGION  MASK 


hFSN'!  I  5  x  51 

I.L  NEGATION 


LINES  MEMORY) 


n2f  BITS/SEC 


6  "  5  LON  VS  1 26X 

*«  *1 

J  1^5  PIXEL  MEMORY) 

Eii- DERATION  |  4  LINES  MEMORY 


SHLJLU 


NORMALIZATION 


26X 

49* 

1  SORT 


energy 


f'  vfNERGY  MEASURE 
:<large  WINDOW) 
j  <31  x  31)  STATISTICS 


=  31  LINES 

memory 

33  10  OPERATIONS/PIXEL 


:*OLOSfLECriO* 


ITNHESWOI  D 
J'lATUBE  |fl| 


PRINCIPAL  \ 

COMPONENT  F  . 

TRANSFORMATION  “  'n  *'■ 

MULTIPLY 


matHnjc 

UuLTiPj 


THreSholu 


SMOOTH  iNi 


J  FEA7URE  SELECTION  (THRESHOLDING! 
f<N)2F 

* - SEGMENTATION 

(N-)2F 

.POLYNONIAL  /-Sl 

COEFFICIENTS  <  )  SEGMEN.CR 


CLASSIFICATION 


DISCRIMINANT 

function 

evaluation 


finder 


Figure  2 


REGION  MASK 


Figure  4.  Ohlander 


193 


B. 


Residue  Techniques  for  an  IU  Processor 


possible  Candidate3  for  th,  as  a 

ashpercthl8h.thr°U?hpUt  1".  -quirede:els:vP:rraa“°nS 

modular  natur^of ^he^ro"01^1"8  th6  esser*tiaHy 

dynamic  range  to  be  rhP  C?Sa°r’  Whlch  Permits  the 
al  Parallei8proeessi^T  ^  Slmply  a"  nd"tion- 
base.  An  attractiv  f  ructure  using  a  different 
that  it  eliminates  the  "carry °fco^S  technique  is 
associated  propagation  delay!.  F S  iV"^ 
increased  speed  .  anally,  the 

than  the  conventional* hi  *  V  USing  a  radix  higher 
tageous.  I  h  "  ,ary  Structure  is  advan- 
considered  as  a  bridl  b’r  reSidue  Ca"  be 
low-speed  d-'gitaf  aS  the  s  r  high-a««acy , 

are  involved.  ^irst^hTbin*10"8  ’  thr66  Steps 
converted  to  rem„  inders"  ofb  y  operands  must  be 
Next,  these  <Va). 

fashion  (i.e  the  ...  «  '  b  added  ln  a  modular 

also  be  a  remainder,  which'is  to°  remadnders  mus,t 
take  the  remainder  of  tL  s^)  ZAlU  T 
result  must  be  converted  back  to  hi  Uy’  bhe 
seems  to  be  the  most  difficult  of  Me  th 
In  addition  to  this  ovprh^  a  f  C  ‘e  three  steps. 

difficult  to  make  decision! while'T^  A  *3 
representation  since  unlike  hi  ‘T  residue 

«ch  t„:  - 

The  overhead  costs  associated  with  the 

St:e;!;:;for  -p-«l0„ .  22  ueinap. 

easier  to  do  several^ l^M-  °ften  it  is 

ands  than  to  do  one  Operation^lth^  3ma11  °P6r' 

So  if  enough  operation^  u  large  operands. 

two  conversion  steps  "he  d°ne  bet“een  the 

be  cheaper  to  do  than  th  Pr°Cess  as  a  whole  could 

binary  arithmetic.  For  IU^rk^th*  h^638  USlng 
computation  is  donp  at-  i-u  *  °f  the 

mostly  of  signaf  oroces  °W  leVel  and  Conslsta 

to  be  made.  For  algorithm!  reqalring  no  decisions 
that  require  severaf 11,16  finder 
are  numerous  operations  ^  involutions,  there 
the  residue  fom  I  ,;M  be  perforn>ed  in 
with  integer  quanM f?dltl0"f 11y»  we  are  dealing 

to  residue^techni^cs  S’lft1Ch  ar%ide“lly  suited 

ware,  a  residue “Sso  bulldlng  hard~ 

simpie  logic  perLrming  ^  b  K^rt^  "61" 

*  'Zssnzz 

b*™  »< 

different  Systems  that  Pr°8ra“  ls  to  examine  the 

mine  the  feasibilitlTof "f"*  6h°sen  and  t0  detar- 
que.  The  study  will  be  residue  techui- 

ion,  so  that,  by  using  the  !  *"  3  parametric  fash- 
ular  system  one  ran  a  parameters  of  a  partic- 

an  advantage.  Areas  th!!™1"!  “  residue  wil1  be 
look  at  or  will  be  We  V6  already  begun  to 

logic,  dynamic  range  rSufr^enr1^6  °Te"lon 

residue  makes  little  sense)  Unding  ln 

multiplying  in  ^°8*e  for  adding  and 

mediate  resuJts  aL  m,tLeah0d3  f°r  SCali^  -ter- 

’  and  “uthods  of  representing  the 


numbers  in  residue  to  faci’itnte  m  ,  , 

calculations.  the  logic  for 

C-  ECSS  Simulation 

increases^6  the  "systems  “lcf°-el«tronic  circuits 

chips  mav’become^c"  sooh  sm"  '"|bullt  °n  these 
the  design  and  to  Li  T°  °Ptimi*e 

it  is  desirable  to  simulate  th^o""'  performance> 
the  circuits  are  ant  n  operation  before 

For  c,r  wort  0  “  “UJ  C°nStructed  in  hardware, 

chosen  to  use  a  nr  ^  Gsign  at  HRL,  we  have 

.m*j  sr:  oSDs"iifi“;^ 

s“,“,ccsysET”d‘u*  c”pot" 

like  Inn  is  writCen  in  a  natural  English- 

a,«i"sr„8*ui,*ndI;‘  ?s  b”th 

and  the  syntax  of  thc^n68333  3  proccdural  format 
it  can  compactly  define  and^ecif y°!h!rUCted  S° 

of  a  computer  system  (storage  devices  x/oT"?'1'3 
CPUs ,  printers)  «  M£1n  7  1US>  i/0  devices, 

components  (job  scheduling  lnteractions  the 

allocation, 

ance  and  bottlenecks  of  rht'  1  lndicate  perform- 
modeled,  ECSS  outputs  statistiL!^^  ®yStem  being 
activated,  percent  utiliLL  /h  ‘8  the  Jobs 
length  of’qu'eues  for  the  vario"u  "  d6Vi-S'  aad 

program  provides  a  variety  of  te  ^eSOurces-  rhe 
ing  systems,  and  the  user  Hphp ^ahniques  for  model- 
csts,  may  focus  or  modal/  p  ndin£  on  his  inter- 
tem  I/O  transfers,  or  arbif  Pr°8ram  behavior,  sys- 
on  a  particular  device'  It  "i  ry  aC“vlties  running 

."SK^rSx-5  Tced-alit--- 

tions  ofTFcS^:r^^st SevUi8ceerst0joeLtentd  deflni' 

sions)  and  add  new  commands  as  desired  “an3mlS' 
when  the  models  provided  bv  Frsqa  '  S°’ 

.•Krris  2?  s-  %i‘ 

sP;tInary  Definition  of  I,lne-Finder 

finder  system  would  require.  *  8  the  line_ 

The  gate  count  ha*-  been  Hi'vf.ioj  i  - 
for  memory  and  gates  for  random  Lgic  ThlsT" 
figures,  along  with  a  state-of-the-art  m^h  c 
gates  per  chip  for  both  random  ogle  Ld!- 
were  then  used  to  determine  81  d  memory» 

number  of  chips  for  approximate  total 

calculated  a  chlS  cou!  Zl  tZ  Additionadiy .  we 

technology.  Thi!  ^  f  ^  predicted  1985-1990 
directed  graph  TWe  f  ^  6aCh  block  on  the 

grapn.  These  functions  inelude: 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 
(7) 


Kernel  generation 
5x5  convolutions 
Edge  detection 
Edge  thinning 

pr!ndTSOr  and  successor  generation 
*  and  S  memory 

Edge  linking. 


194 


Rough  designs  were  performed  for  the  convolu- 
tions,  edge  detection,  and  sdg-  thinning  to  esti¬ 
mate  the  number  of  gates.  The  P  and  S  generation 
can  be  estimated  by  replicating  the  edge-thinning 
logic  six  to  ten  times,  since  it  is  making  similar 
omparisons  as  for  edge  thinning  but  more  of  them. 
The  logic  for  edge  linking  is  involved,  since  it 
must  be  able  to  generate  addresses  for  random 
access  of  the  P  and  S  memory.  In  addition,  logic 
or  each  pass  must  be  generated,  since  each  pass 
performs  a  different  function.  So  for  the  pu- 
poses  of  this  sizing,  the  edge  linking  was  not 

of  the  fd'  JaMe  1  Sh°WS  the  83te  counts  for  each 
of  the  functions.  Notice  that,  for  the  convolu- 

tions,  we  sized  two  types,  one  with  no  multiplies 
°"6  “lth  six  “olliplles.  This  is  because  the 
algorithm  we  are  sizing  has  six  convolution  ,  two 
o  which  can  be  sca^d  such  that  all  the  weights 

aceieHtheVew°  °r  °ne’  Bnd  f0ur  Which  can  be 
scaled  such  that  all  but  six  of  the  weights  are 

either  zero  or  one. 

Table  2.  Cate  Count  for  Line- Finder 


loev  .1*1  CUrrent  state  of  the  art  for  MOS  techno- 
ontp  f°WS  jS  t0  PUt  bits  of  memory  or  20,000 

the!  ff  rand°m  l0glC  °n  3  single  shiP*  Using 
.  J  fgures.  we  are  able  to  partition  the  algo- 
rithm  onto  a  set  of  chips  and  thus  calculate  a 
total  number  of  chips  for  the  system.  One  possible 
partitioning  is  as  follows:  F 

Chip  Functions 

1  5  line  kernel,  2  convolutions  (no  multiply) 


Function 

Numbe  r 

Cate  Count 

Used 

Total 

5  Line 

Kernel 

AxS^xS 
-  16K  bits 

1 

16K  bits 
Memory 

5x5  Conv 

No  Multiplies 

1.25K 

2 

i .  5K 

5x5  Conv 

6  Multiplies2 

13.25K 

4 

53K 

Edge 

Detection 

.  35K 

1 

.  35K 

3  Line 

Kernel 

2x512x12 
**  6K  bits 

2 

12K  bits 
Memory 

Edge 

Thinning 

.  35K 

1 

.  35K 

P  and  S 
Generation 

3K 

1 

3K 

P  arid  S 

Memory 

512x512x10 
=2.5  Mbit 

.  i 

1 

2.5  Mbit 
Memory 

1  -  Assumed  image  size  =  512x512 

^  -  Multiplies 

accomplished  using  8x256  bit  ROM. 

2-5 

6 

7 

8-47 

48 


One  5x5  convolution  (6  multiplies)  per  chip 
ge  detection,  3  line  kernel,  thinning  logic 
3  line  kernel,  P  and  S  generation 
P  and  S  memory  (40  64K  chips) 

Edge  linking  logic. 


u„HIhe.tU!  pacing  items  (not  counting  the  edge 
.  -king  logic,  which  we  have  not  looked  at)  are 

t ions  anTV,S  mem°Iy  ^  thS  convolution  calcula- 
LaH  :  !  convolution  complexity  can  be  simpli- 

led  by  reducing  the  accuracy  required  on  the 
weights  of  the  kernel.  This  would  imply  that  a 
ogic  multiplier  could  be  fast  enough,  and  the 
memory  for  look  up  table  multiplies  would  not  be 

embedd '  H  **  the  P  and  S  memorj  is 

embedded  in  the  algorithm,  and  thus  the  algorithm 
would  need  to  be  altered  if  we  wanted  to  reduce 
this  memory. 

We  can  perform  this  same  analysis  assuming 
some  future  technology.  We  expect  that  in  several 

or  oYmini1  te  C°  aChi6Ve  1  Mbit  °f  memory1 
cU^  8at6S  °f  rand0m  loglc  a  ^ngle 

pared  to  5  feature  size  as  c°m- 

^  features  f°c  current  technology. 

For  this  case,  we  get  the  following  partitioning: 

Chip  Function 

All  random  logic  and  kernel  generation  memory 
r  and  S  memory.  J 

.  i ,  °Ur  Present  configurations  indicate  that  we 

toJ  and'h  Pln.  bimited  for  ^is  particular  opera¬ 
tor  and  hence  this  partitioning  is  practical. 

It  should  be  emphasized  that  these  estimates 
are  very  preliminary  and  that  the  effort  on  th-se 
algorithms  and  the  other  candidate  systems  will 
continue  during  the  next  period. 

REFERENCES 

K  Linkinatiaa  ,K;  Ram6Sh  BabU>  ,An  Edge  Detection, 
nking  and  Line  Finding  Routine’,  USC  I.P.I 

Semiannual  Report,  September,  1978 

2‘  Till  •  LaWfj  'Textured  Image  Segmentation',  PhD 
Thesis,  USC,  January,  1980 

3.  R.  Ohlander,  K.  Price,  D.  Raj  Reddy,  'Picture 
Mefho^at^°n  Using  a  Recursive  Region  Splitting 
1978  mPUter  Craphics  and  ImaSe  Process- 


1 

2-4 


4. 


C.R.  Nudd,  P.  A.  Nygaard ,  S.D.  Fouse,  T.A 
Nussmeier,  ’Implementation  of  Advanced  Real- 
Time  Image  Understanding  Algorithms',  Proceed¬ 
ings  Image  Understanding  Workshop,  DARPA 
April,  1979  * 

G.R.  Judd,  S.D.  Fouse,  T.A.  Nussmeier,  P.A. 
Nygaard,  Development  of  Custom-Designed 
Integrated  Circuits  For  Image  Understanding’ , 

DARPAedvn88  cf  Ima8e  Understanding  Workshop, 
DARPA,  November,  1979.  F 


195 


A  "N  JN-CORRELATICN"  APPROACH  TO  IMAGE-BASED  VELOCITY  DETERMINATION 

0.  Firschein 

Lockheed  Palo  Alto  Research  Laboratory 
Palo  Alto,  CA  94304 

M.  Or on* 

Department  of  Aeronautical  Engineering 
Technion,  Israel  Institute  of  Technology 
Haifa,  Israel 


ABSTRACT 

Determination  of  the  ground  velocity  of  c 
vehicle  using  time-sequential  images  is  part  of  th< 
Passive  Navigation  study  dealing  with  autonomous 
navigation  of  a  vehicle  using  passively  sensed 
-mages.  This  paper  discusses  a  "non-correlation” 
intensity  gradient  approach  to  velocity  determina¬ 
tion,  and  shows  in  an  appendix  that  such  approaches 
are  indeed  related  to  correlation.  A  low  cost 
velocity  rx ter”  is  described  that  is  based  on 
intensity  gradients  and  uses  a  linear  solid  state 
sensor. 


INTRODUCTION 

The  determination  of  the  ground  velocity  of  a 
slow-flying  aircraft  using  passively  sensed  images 
has  been  one  subsystem  under  investigation  in  the 

Passive  Navigation  Study(1).  Phase  correlation  or 
area  correlation  is  often  used  to  determine  the 
displacement  between  a  time  sequenced  pair  of 
images,  and  then,  knowing  the  time  increment,  the 
relative  altitude,  and  the  orientation  of  the 
vehicle,  one  can  determine  the  ground  velocity. 

Prof.  M.  Oron,  Consultant  to  Lockheed's  Independent 
Research  program,  suggested  a  technique  for  velocity 
determination  that  uses  intensity  gradients  rather 
than  correlation.  The  basic  idea  is  that  the  dis¬ 
placement  can  be  found  using  the  in  tensity/ time 
derivative  between  corresponding  pixels  in  a  time- 
sequenced  pair  of  images  (the  interframe  differences), 
and  the  ^ntensity/space  derivative  for  each  pixel 
xn  a  frame  (the  intra-frame  difference).  A  deri¬ 
vation  of  this  will  be  given  in  the  description  of 
the  velocity  meter. 


It  was  later  realized  that  the  Oron  method  is 
reiated  to  other  "non-correlation”  approaches 
(2,3,4),  particularly  those  used  in  TV  data  com¬ 
pression.  Such  techniques  are  applicable  when 
small  pixel  displacements  between  frames  exist,  as 
in  the  TV  application,  or  in  the  case  of  a  slow- 
flying  aircraft  sensing  the  ground  scene  at  TV 
rates.  An  analysis  of  the  basic  approach  due  to 

Limb  and  Murphy (2)  by  Dr.  W.  G.  Eppler  of  LPARL 
showed  that  there  is  indeed  a  relationship  to 
correlation;  his  proof  is  given  in  Appendix  A. 


*  During  1979-80  Academic  Year: 
University,  CA  94305. 


Visiting  Scholar  at 


The  balance  of  the  paper  is  devoted  to  a  sum- 
mary  of  the  velocity  meter  based  on  the  Oron 
approach.  The  full  description  is  given  in  Ref.  5. 

THE  GROlTiD  VELOCITY  PROBLEM 

Continuous  on-board  determination  of  ground 
velocity  followed  by  time  integration,  can  be  used 
for  autonomous  vehicle  navigation.  While  realizing 
sue  i  a  dead-reckoning  syctem,  it  is  necessary  to 
mi  itaize  error  accumulation,  mainlv  due  to  changes 
m  the  vehicle's  attitude.  If  a  two-dimensional 
imaging  device  such  as  a  TV  camera  were  used  for 
the  velocity  determination,  i*  would  be  necessary 
to  mount  it  on  a  gimballed  inertial  platform  simi¬ 
lar  to  those  used  in  accelerometer-bused  naviga¬ 
tion  systems.  This  could  result  in  an  instrumen¬ 
tation  packagt  which  might  be  more  massive, 
expensive  and  imited  in  application  than  conven¬ 
tional  Inertial  Navigation  Systems  (INS),  but  less 
accurate.  Image-based  velocity  determination  for 
navigational  purposes  can,  therefore,  become  p.ac- 
t  cal  only  if  it  is  realized  at  much  lower  cost 
and  weight  than  INS.  This  is  particularly  tnie 
wnen  considering  the  most  likely  types  of  aircraft 
in  which  such  systems  would  be  mounted  in  order  to 

provide  them  with  autonomous  passive  navigation^^ 
capabilities:  the  miniature  Remotely  Piloted 
Vehicles  (mRPV) ,  some  of  which  are  less  expensive 
than  an  advanced  INS. 

In  the  system  described  below,  simplicity, 
use  of  dimensionally  small  and  inexpensive  compon¬ 
ents,  and  exploitation  of  other  instruments 
usually  installed  in  an  mRPV  are  emphasized.  A 
solid  state  electro-optical  line  sensor,  such  as 
the  linear  array  CCD  which  is  inherently  small 
and  relatively  inexpensive,  is  used  for  imaging. 

Since  this  sensor  is  amenable  to  electronic  compen¬ 
sation  of  attitude  changes  of  autopilot  controlled 
aircraft,  as  has  been  recently  demonstrated  by 

Oron  and  Abrahem^,  the  necessity  for  electro¬ 
mechanical  and  gyroscopic  stabilization  of  the 
optical  axis  is  eliminated,  thus  avoiding  a  heavy 
cost  and  weight  penalty. 

An  additional  advantage  of  one-dimensional 
imaging  is  in  the  much  lower  rate  of  data  gener¬ 
ation  a o  compared  with  the  two-dimensional  case. 

Department  of  Aeronautics  and  Astronautics,  Stanford 


196 


This  makes  it  possible  to  use  modest  computational 
power  and  limited  storage  memory  in  the  digital 
signal  processing  phase.  Furthermore,  the  compu¬ 
tational  method  is  based  on  brightness  differences, 
rather  than  on  two-dimensional  correlation,  and  is 
not  only  less  scene-dependent,  and  therefore  less 
limited  in  scope,  but  also  requires  significantly 
less  time  and  memory. 

BASIC  CONCEPT 

A  vertical  cross  section  through  an  airborne 
imaging  system  based  on  a  line  sensor  containing 
M  elements,  each  giving  rise  to  a  pixel  (picture 
element)  of  size  6  and  intensity  1^  i ,  is  shown 

in  Figure  1.  The  i-index  (i  =  1,..,,M)  designates 
the  position  of  the  element  (or  pixel)  along  the 
sensor  axis,  x,  which  lies  in  the  focal  plane  of 
the  lens,  at  an  angle  y  to  the  longitudinal  axis 
of  the  aircraft,  x^  (see  Figure  2).  The  n -index 

(n  -  1 , .  . .  ,N)  designates  the  pixel’s  time  of 
occurrence  : 

fcn  =  nT  (1) 

where  t  is  the  exposure  time  of  the  sensor 
between  readings.  Thus,  every  t  millisecond  the 
sensor  generates  M  pixels  or  readings  of  in  ten - 
sity  *n,i  are  discrete  samplings  of  a  two-, 

dimensional  image  brightness  function,  f(x,y), 

taken  along  the  x-axis  at  time  t  .  Since  v 

n 

and  y  vary  with  time  because  of  the  aircraft's 
motion  relative  to  ground  (the  ground  velocity,  V), 
the  intensity  readings  will  also  vary  with  time. 

It  is  easier  to  visualize  this  variation  as 
being  caused  by  a  motion  of  the  whole  brightness 
functiun  f(x,y)  in  the  focal  plane  in  a  direction 
opposite  to  that  of  the  ground  velocity,  V  .  This 
"focal  velocity"  is  given  by; 


where  F  is  the  focal  length  of  the  lens  and  H 
is  the  aircraft  altitude  relative  to  ground.  Since 
F  is  constant  and  known  and  H  is  measured  inde¬ 
pendently  by  other  on-board  instrumentation,  the 
extraction  of  v  from  the  1^  ^  reelings  will 

make  it  possible  to  continuously  determine  and 
integrate  V  with  respect  to  time. 

As  mentioned  above,  f(x,y)  is  an  implicit 
function  of  t  .  For  those  parts  of  this  function 
which  are  also  continuous  and  analytic,  it  is 
possible  to  calculate  the  total  time  derivative; 


of  =  M  .  dx  df  #  dy 
dt  dx  dt  dy  *  dt 

From  Figure  2  and  equation  (2)  the  following  re¬ 
lations  are  derived: 


-  v  •  ^  =  v 
H  x’  dt  \ 


(4) 

(5) 


vt 

tan  o  =  — ^ 
v 

x 

Substituting  (4)  in  (3),  an  expression  for 
obtained : 

df  df 


(6) 


v 

X 


is 


(7) 


For  the  simple  case  of  v  *=  0,  i.e.,  when  the 

sensor  is  aligned  exactly  along  the  ground  velocity 
vector  (p  *  0),  equation  (7)  becomes; 


v  =  v  = 

x  df 

dx 


(8) 


The  time  and  space  derivatives  in  (8)  can  be 
approximated  by  brightness  difference  expressions, 

designated  A  I  and  A  I  respectively.  The 

H  t  .  11 

1 ^  expression  is  related  to  the  total  time 
derivative  at  x  *  and  is  calculated  from 


successive  intensity  readings 
(In-i,i,In,i’In+l,i  of  the  ^  element: 


df  = 

d  t  T 


I  .  -  I  ,  . 
n  ,  l _ n-  1 ,  l 


(9) 

Similarly,  A  Ir  is  related  to  the  partial  space 
derivative,  or  brightness  gradient,  calculated  at 


time  t  =  t  : 

n 


df 

Ox 


A1! 


I  J  "  I  .  , 
n ,  i  n  ,  i- 1 


(10) 


Substituion  of  (9)  and  (10)  into  (8)  while  account¬ 
ing  for  the  optical  sign  reversal  between  V  and 
v  leads  to: 


n>i  aT 


i  _  6 

T 


(ID 


n 

where  v  is  the  focal  velocity  calculated  at 
t  h  1  * x 

the  i  element  at  time  tR  .  m  order  to  simpli¬ 
fy  (11)  and  eliminate  the  minus  sign,  a  negative 
unit  vector  velocity,  vp,  equal  to  one  pixel,  6, 

per  one  exposure  time,  t,  in  the  minus  x  direc¬ 
tion  is  defined: 


A  £ 

VP  t 

and  substituted  into  (11)  yielding: 


v 

n  > 


i 


A°I 


A1! 


v 

P 


(12) 


(13) 


A  further  simplification  is  obtained  by  defining  a 
dimensionless  focal  length  velocity  coefficient. 


POCAI  <w«* 


iuw-  rr 


cross-section  of  airborne  imaging  system  (sensor  is  parallel  to  ground  velocity 


Y  (NORTH) 


(HEADING) 


{W«riMG) 


fllhGO*  ANGLE-) 


(MliALlCNW  NF  ANGLE1! 


systems  projected  onto  the  grourd  plane 


as  follows 


j-f  M  and  N  are  large  enough,  a  statistically 
satisfactory  distribution  of  the  values  i 

achieved,  enabling  a  good  estimate  of  the  true 
value  of  v  .  One  simple  such  estimate  would  be 
the  arithmetic  mean: 


198 


v  = 


^  -  i  KN~T7y 


£  a  a 

_ _i 

J'2  n-Z  *  \ 


(16) 


the  -ime  derivative  to  thes  the  ratl°  of 

determined,  and  the  ratin!  PaCe  derivative  is 
frame.  This  differs  from  theVT88^  over  the 
approach (2)  fn  uhi  lmb  and  Murphy 

apace  derivative  are  atr^  and  the 

averages  °  “  °bt3lned  ^  Evidi^the^iwo"*1 

proposed  instrumeNTATion  SCTJP 

described  abovef  atTinstrum  °f  the  baSlc  concept 
includes  some  components  SetuP  "hie? 

ploying  th. 

oy  Oron  and  Abraham(6>  ln  th  .  elements)  used 

(S  =  12.5  pm  r  =  2  me  s  heir  sys  tem 
theirs  (F  =  l00  mm)  but^  With  °ptics  similar  to 
geometry  shown  in  Figure  *"  an  ^inS 

velocity  Vp  -  -b^sl/sec’is  Tbf^ 
velocity  for  typical  f 1 •  u  The  focal 

H  =  3000  ft)  is  v  =  4  8h/  conditions  (V.  „  80  kt 
coefficient  is  „  .  0  \  ™/sec  and  the  velocityk  ’ 
will  vary  between  40  Lnl  AS  the  gr°und  velocitv 

In  other  woric  ^UL  be  v  =  0  q?  a  * 

between  along  then's?;96- 

about  a  third  of  a  pi^l  a ^e!  WlU  be  between 
whole  pixel.  The  translatS  jUSt  l6SS  tl,an  a 
Pitch  and  roll  instability  ls"  ?!5  fXP°sure  due  to 
that»  but  its  accumujflM  y  less  thun  10%  of 
VSing  signals  from  the  orTh??”/"  comPGnsated  for 

instalied  in  the  autopilot  s^!  V6rtical 

previously  described {6>  For  th"  “  ^  38 

512  readings  out  of  the  102A  ,hat  purPose  only 
actually  processed  In  the  11  Wil1  be 

f°r  determining  „  .  he  Mutational  procedure 

The  511  interpixel  dlft»  i 

computed  on-line  duiine  tlences»  *  In,  will  be 
the  order  of  2  msec)  aLT  exp°sure  time  t  (or 
register  together  with  the  5^2 ^aT"  "  Special 
the  end  of  the  sec  a  S  °f  Ji  At 

511  i  "  exposure  time  (n  =  2)  th 

l:Uere-Posure  differences  A  will  , 
computed  usinp  ^  wil1  aisc  be 

■«*r  .f  th. 

will  yield  a  „xxlfl,jm  of  by  tbe  former 

:-T‘r:  *ny  °£  -firussis^ 

Ii  denominator  values  wh^u 
certain  threshold  win  f.  *"*  l0Wer  than  a 
0rder  to  prevent  erroneou^^h^  eliralnated  in 
caused  by  scenery  which  ?  *'  high  u  results 
(f!at")  to  yieidy  “It;*  in  contrast 
gradient.  Secondly,^"  £?  SP<iCial  brightness 
(sharp  edges”)  will  al  y  £*gb  or  abrupt  gradients 

neLrernt  df scon tinuities  in  th  C3/ed  Since  they 
ness  function  or  in  its  first  ^  /(x’y>  brighc- 

* beiong  - 


CNXiy4oi  /  3  Per,0d  °f  about  T  =  fin 
be  computed  ZT’  2°’000  values  . , , 

expec ted  of  t^  WUi 

relatively  large  and  <tatisM  P!-Tiod-  This  is  a 
ulation  of  results  wh’ch  ass  Uy  controlled  r,op- 
ution,  provided  that'onl?  r?r?S  3  Gaussian  distri 
correlated  noise)  occur  in  d°m  err°rs  (or  non- 
systematic  errors  are  ^  all  th( 

An  important  system- n„ 

^  arise  as  a  result  oTllt  ^  non'tandom  erro, 
sensor  and  the  direction  ”1SaUgnment  between 
vector  as  can  be  seen  from  ^  8round  velocity 
(?)  and  (fi).  To  ev  fro™  comparison  of  equations 
sensor  could  be  mounted  n  UCb  misalignment,  the 

•in  ■»  .......  r^d,z‘  •?“*•!  w-.;  .hied 

of  station  which  iq  1  plane  ar°und  an  uxlq 

aUfaca  paranhei1St0PethPeend2iCUlaf  t0  tba  -rS1- 

Tbe  rotational  motion  can  h  3X13  0f  the  aircraft 
s  epping  motor  which  n-ed  by  3  Sma11 

OO  msec  per  step)  0-  more  -  faSter  than  100  Hz 

chan  within  1°.  Thus, ^f "the  dT'6^’  positf°ned 

80  JT:  90ndmsthe  tlme  Spent  ab  eaCcCh  a°ng?ef  ?°tion 
second  check  about  S°  o'f ^ one 
.  thl;s  cJrrection.  After  8he’  5  t0  each  side 
about  40  different  sets  ?/  an8uUr  banning, 
cons  sting  of  more  than  10  on'n  comPutations ,  esc 
be  obtained,  The  set  ll  w  ?  VaUd  values  win 
statistical  distribution  ?f  ?iU  haVe  the  best 

^m)(th?,m°St  Gaussian-life1at?dValUeS  ar°Und  the 

gram)  will  be  the  one  with  t??  syn™etric  histo- 

“lcal  or  Erectional  bias  erXSmtileSt  systGn" 

the  sensor  will  then  be  m  t  and>  therefore 
the  ground  velocity  direction ?  y  8  l8ned  with 

sensor  on  a 

advantage;  it  enables  1®  ltlonal  Practical 
changes  in  the  yaw  or  hlT10"  for  sudden 
y  an  on-board  compass  Thf//?616  *  mt'asured 
™ent  will  be  process/)  ,  h  °Utput  of  this  instm- 
signal  which  can  be  fed  in?r°V?de  1  correctional 
controller.  Thus,  wheL/r  ^/^PPlng  motor 
occurs,  mainly  due  to  gusts  or^T  change  in  yaw 
r.f°r.W:i'1f  compensate  for  it  f  W  /  tbe  stePping 
rate  is  much  faster  than  re  .mmediately  (a  100  Hz 
dynamics  of  an  mRPV) .  qUir"d  bV  the  flight 

was  discussed.  yTherp%b?n'f°1]0^in8  or  tracking 
UP  the  sensor  along  '  setti^ 

velocity  vector  can  he  d.irection  of  the  ground8 
seconds.  in  th  b  solved  within  5  to  10 
force”  scan  at^  ^  3  "b-te 

t ions ^  180°% f  a8negCS8tot0a„C',eCk  ^ITSL.. 

is  more  likely  that  the  g?%!?CUrfCy  of  2°>  hut  it 
win  be  foUnd  sooner  than  °hTt  ^1°°^  direction 
strategy  0f  starting  from  ,1  J  In  fact.  a 
of  the  aircraft  and^cannir,?"  headlng  direction 
its  both  sides,  will  lej/  8  Systematically  on 
Plane's  bearing  anile  T to  the 
ve  ocity  direction  relative  to  rh^v8  the  8round 
in  most  cases  the  heading  and  h  %  °rtM  since 

are  not  to°  —y  irSh^r 


199 


DISCUSSION 

A  computer -s imulated  experimental  investiga¬ 
tion  was  carr*  d  out  to  test  these  ideas.  The 
simulation,  described  in  Ref.  5,  indicates  that 
while  working  at  u  values  close  to  unity,  it  is 
possible  to  achieve  high  directional  sensitivity 
as  well  as  magnitude  accuracy.  In  order  to 
approach  such  v  values  it  is  necessary  to  intro¬ 
duce  a  change  in  t  as  V  varies;  thus  after  V 
has  been  calculated,  a  signal  is  fed-back  to  the 
sensor  controller  which  changes  its  frequency 
between  200  KHz  and  500  Hz. 

The  overall  accuracy  of  the  ground  velocity 
determination,  and  hence  of  the  whole  dead-reckon¬ 
ing  navigational  system,  is  perhaps  more  criti¬ 
cally  dependent  on  the  auxiliary  readings  oi  the 
altitude,  H,  and  the  heading  angle,  f,  than  on  the 
new  electro-optical  sensor-based  part  of  the  sys- 
tme.  If  a  passive  absolute  altitude  barometric 
altimeter  is  used,  an  accuracy  of  approximately  1% 
can  be  achieved,  however,  the  uncertainty  in 
terrain  elevation  data  prestored  in  the  system  and 
subtracted  from  H  tc  yield  relative  altitude,  may 
raise  that  figure  to  a  2%  level  which  for  the 

V  ran8e  *-s  worse  than  the  v  accuracy  (about 
I/*;.  Experiments  show  that  it  is  possible  to 
deduce  that  an  accuracy  of  about  1°  can  be  achieved 
for  the  angle  of  velocity,  compatible  with  the 
compass  \|r  readings.  Further  experiments  with 
hardware  rather  than  computer  simulated  motion  are 
needed  to  establish  whether  a  better  accuracy  in 

Y  can  be  accomplished  justifying  the  use  of  more 
accurate  instrumentation  for  ^  measurements. 

In  summary,  it  can  be  stated  that  ground  vel¬ 
ocity  determination  on-board  raRPVs  using  electro- 
optical  line  sensors  in  combination  with  existing 
instruments  is  not  only  feasible,  but  quite  prac¬ 
tical,  sufficiently  accurate  and  not  too  expensive 
(in  terms  of  weight  and  cost).  The  velocity 
values  are  computed  at  a  rate  of  between  10  to  5 
per  second,  which  is  much  higher  than  necessary 
for  the  relatively  slow  flight  dynamics  of  a  mRPV. 
This  enables  us,  by  using  fairly  simple  filtering 
and  prediction  techniques  in  the  navigator  prior 
to  integration,  to  raise  the  level  of  confidence 
in  this  dead -reckoning  navigation  system  as  well  as 
to  overcome  short  "dark"  periods  when,  due  to 
cloud  coverage,  or  extremely  "flat"  scenery,  no 
velocity  computations  can  be  made.  In  this  con¬ 
text  it  should  be  emphasized  that  the  method 
proposed  here  is  by  no  means  limited  to  the  visual 
spectrum;  any  electro-optical  line  sensor  can  be 
used,  thus  perhaps  expanding  the  range  of  appli¬ 
cability  to  include  overcast  days  as  well  as  night 
operation^ 

ACKNOWLEDGMENTS 

The  experimental  work  was  performed  in  the 
Signal  Processing  Laboratory  of  the  Lockheed  Palo 
Alto  Research  Laboratory.  We  would  like  to  thank 
Dr.  J.  j,  Pearson,  head  of  the  group, 

Dr.  W.  G.  Eppler ,  Dr.  K.  Dutta,  and  Mr.  C.  D. 

Kuglin  for  many  helpful  discussions  and  suggestions. 
One  of  us  (M.O.)  would  like  to  thank  Professor 
D.  Debra,  head  of  the  Flight  Control  and  Instru¬ 


mentation  Laboratory  of  the  Department  of  Aero¬ 
nautics  and  Astronautics,  Stanford  University  for 
his  kind  hospitality  and  most  constructive  comments 
on  this  presentation.  The  authors  would  like  to 
express  their  gratitude  to  L.  Staley  for  speed  and 
accuracy  in  the  preparation  of  the  s-veral  drafts 
and  the  final  manuscript. 

REFERENCES 


1.  0.  Firschein,  D.  Gennery,  D.  Mr.lgrara  and 

*  J  Pearsor,  "Progress  in  Navigation  Using 
Passively  Sensed  Images,"  Image  Understanding 
Workshop,  Menlo  Park,  CA,  April  1979. 

J*  °«  Limb  *r.d  J.  A.  Murphy,  Estimating  the 
Velocity  of  Moving  Images  in  Television  Signals, 
Computer  Graphics  and  Image  Processing.  1975 
311-327.  - * 


3.  C.  Cafforio  and  F.  Rocca,  Methods  for  Measur- 
ng  Snail  Displacements  of  Television  Images  IEEE 
I:|B|^Inform.  Theory.  IT-22,  No.  5  (Sept.  1976)~ 


4.  A.  N.  Netravali  and  J.  D.  Robbins,  Motion-co*- 
pens-ted  Television  Coding,  The  Bell  Svsrem 
Technical  Journal.  Vol.  58,  No.  3,  March  1979, 


.  - -  J  VJ  i-  uuuu 

Velocity  Determination  by  Digital  Processing  of 
L lec tro-Optical  Line  Sensor  Signals,  Proc.  SPIE 
Vol.  219,  1980  (SPIE  Technical  Symposium,  Lo" 
Angeles,  CA,  Feb.  1980). 


6.  Oron,  M. ,  and  Abraham,  M. ,  "Analysis  Design 
and  Simulation  of  Line  Scan  Aerial  Surveillance 
Systems,"  Proc.  SPIE.  Vol.  219,  1980.  (SPIE 
Technical  Symposium,  Los  Angeles,  CA,  Feb.  1980). 


200 


APPENDIX  A 

RELATION  OF  CORRELATION  TO  THE  FRAME -DIFFERENCE ,  ELEMENT-DI FFERENCE  METHOD  OF  VELOCITY  DETERMINATION  IN 

TELEVISION  SIGNALS 

W.  G,  Eppler 

Lockheed  Palo  Alto  Research  Laboratory,  Palo  Alto,  CA  94304 


Al.  INTRODUCTION 

(2) 

Lima  and  Murphy v  describe  a  method  for  esti¬ 
mating  the  velocity  of  moving  images  in  television 
signals.  The  displacement  between  two  frames,  for 
small  displacements,  is  given  by  the  equation, 


eI  fds 


£|eds 


(A-l) 


where  x  is  the  displacement  between  the  images 

EDS  is  the  frame  difference  (the  intensity  diff¬ 
erence  between  two  successive  frames  at  a  par¬ 
ticular  pixel) 

EDS  is  the  element  difference  (the  difference 
between  an  element  and  its  left  neighbor) 

(3  4) 

Later  papers  by  other  authors  *  provide  exten¬ 
sions  to  the  original  Limb/Murphy  equatiois. 

This  appendix  shows  the  relation  of  correlation 
to  the  Limb/Murphy  results. 


flKx)  =  £  I  ( i)  I(i  +  x) 


(A-4) 


«S(0)  -  «<(x)  =1  z[l2(i)  -  21  ( i)  I  ( i  +  £)  +  I2(i  +  X)] 


=  ~  Itl(i)  -  I(i  +  x)]2 


I[l(i)  -  I(i  +  x)j 
/»  i 

X  =  - - - 

^l(i)  ^  I(i  +  1)]' 
i 

To  the  extent  that  [  j2  pj  |  | 


S|l(i)  -  I  (  i  +  x)  | 

a  i _ 

X  z|l(i)  -  I(i  +  1)| 

i 

s|i(i)  -  i ( i  +  i)|  =  e|eds| 


(A-5) 


(A-b) 


(A-7) 


(A-8) 


A2 .  DERIVATION  OF  THE  CORRESPONDENCE 

First  we  assume  that  the  correlation  curve 
is  linear  between  0  and  1  sample  interval,  as 
shown  in  Fig.  A-l. 


Fig.  A-l  Assumed  Linear  Correlation  Curve  for 
0  <:  x  £  1 

If  we  know  the  correlation  values  at  «<(0),  «<(x), 
and  ^(1),  we  can  interpolate  the  x  value  as 


x  =  iisx.  ~  £1*1 

«S(0)  -  «i(l) 


(A-2) 


Now  we  substitute  expressions  for  the  three  cor¬ 
relation  values,  noting  that  *>(0)  can  be  written 
in  two  ways,  since  the  image  is  stationary: 


rf(0)  -  E  IZ(i) 

i 


E  I  (i  +  x) 
i 


(A-3) 


To  the  extent  that  the  displacement  is  in  the  x- 
direction  only: 


E|l(i) 
i 

Substituting  A-9  and  A-8  in  A-7,  we  obtain, 


I(i  +  x)  I  =  E I FDS  I 
i 


(A-9) 


1 

H 

i — 
i 

1 1, 

1 

1 

i 

1 

1  2 
| 

1 1 

I 

_J _ 

J  1  pixel 

^displacement 

Fig.  A-2  Effect  of  the  One  Pixel  Horizontal  Dis¬ 
placement  for  a  Simple  Figure 


(Limb  and  Murphy) 


rj  FDS I 

i _ 

e1eds| 

i 


(A-10) 


Thus,  the  Liiub/Murphy  algorithm  amounts  to  sub¬ 
pixel  interpolation  of  the  correlation  peak  in  the 
raatchpoint  neighborhood;  the  range  of  the  algorithm 
is  the  correlation  distance.  | EDS |  controls  the 
correlation  function  slope,  and  depends  on  the  pro¬ 
jected  lengths  aud  the  intensity  differences  across 
vertical  boundaries;  in  Fig.  A-2  the  displaced  rec¬ 
tangle  of  height  H  results  in  | EDS  |  -  2H|l  "  I2I  • 


201 


BOOTSTRAP  STEREO 


Marsha  Jo  Hannah 


Lockheed  Palo  Alto  Research  Laboratory 
Department  52-53,  Building  204 
3251  Hanover  Street 
Palo  Alto,  CA  94304 


ABSTRACT 

Over  the  past  two  years,  Lockheed  has  been 
working  in  navigation  of  an  autonomous  aerial 
vehicle  using  passively  sensed  images.  One  tech¬ 
nique  which  shows  promise  is  bootstrap  stereo,  in 
which  the  vehicle's  position  is  determined  from 
the  perceived  locations  of  known  ground  control 
points,  then  two  known  vehicle  camera  positions 
are  used  to  locate  corresponding  image  points  on 
the  ground,  creating  new  control  points.  This 
paper  describes  the  components  of  bootstrap 
stereo  -  camera  calibration,  new  landmark  selec¬ 
tion,  techniques  for  efficient  control  point  match¬ 
ing  from  image  to  image,  and  control  point  posi¬ 
tioning  . 


INTRODUCTION 

Before  the  advent  of  sophisticated  navigation 
aids  such  as  radio  beacons,  barnstorming  pilots 
relied  primarily  on  visual  navigation.  A  pilot 
would  look  out  the  windov  of  his  airplane,  see 
landmarks  below  him,  and  know  where  he  was.  He 
would  watch  the  ground  passing  beneath  him  and 
knjw  how  fast  and  in  what  direction  he  was  moving. 
Unless  th*1  ground  was  obscured  by  clouds  or  dark¬ 
ness,  he  did  quite  well  at  navigating  toward  his 
destination. 

Today,  there  exist  applications  for  which  a 
computer  implementation  of  this  simple,  visually 
oriented  form  of  navigation  would  be  useful.  One 
scenario  hypothesizes  a  small,  unmanned  vehicle 
which  must  fly  accurately  from  its  launch  point  to 
its  target  under  possibly  hostile  circumstanc  s . 
This  must  be  accomplished  without  relying  on  xter- 
nal  signals  (which  could  be  jammed)  or  emitte  i 
radiation  (which  could  be  used  to  track  and  destroy 
the  vehicle) .  Other  options  which  have  been  con¬ 
sidered  and  rejected  include  a  simple  pre-programm¬ 
ed.  flight  plan  (gross  errors  in  cour  could 
accumulate  from  unpredictable  wind  effects)  and  a 
high  quality  inertial  guidance  system  (which  would 
add  excessive  weight  and  expense). 

It  is  felt  that  the  current  state  of  the  art 
in  artificial  intelligence  and  image  processing, 
coupled  with  the  availability  of  small  solid  state 
sensors  and  microprocessors,  will  enable  us  to  use 
passively  sensed  terrain  imagery  in  a  visual 


navigation  system  small  enough  to  be  flown  in  an 
autonomous  aerial  vehicle  (AAV).  To  prove  feasi¬ 
bility,  we  are  implementing  and  refining  these 
techniques  in  computer  software. 

.  Our  overall  approach  to  the  problem  involves 
providing  the  vehicle  with  a  Navigation  Expert 
having  approximately  the  sophistication  of  an 
early  barnstorming  pilot.  This  expert  will  navi¬ 
gate  partly  by  its  simple  instruments  (altimeter, 
airspeed  indicator,  and  attitude  gyros),  but  mostly 
by  what  it  sees  of  the  terrain  below  it. 

The  Navigation  Expert  will  consist  of  an 
Executive  that  weighs  evidence  and  makes  final 
decisions,  and  a  group  of  Specialists  (see  Figure 
1).  Each  of  these  Specialists  provides  information 
on  its  area  of  expertise,  along  with  measures  or 
confidence  in  its  results  to  aid  the  Executive  in 
resolving  contradictory  opinions  from  two  or  more 
Specialises.  The  Specialists  identified  to  date 
include  an  Instruments  Specialist  to  provide 
images  and  flight  instrument  readings,  a  Dead 
Reckoning  Specialist  to  estimate  position  from 
past  navigation  data,  a  Landmarks  Specialist  to 
recognize  checkpoint  landmarks,  and  a  Stereo 
Specialist  to  perform  a  variety  of  stereo  photo- 
gramme  trie  tasks. 

This  report  covers  one  aspect  of  the  Stereo 
Specialist,  a  technique  which  we  call  bootstrap 
stereo . 

THE  BOOTSTRAP  STEREO  CONCEPT 

Given  a  set  of  ground  control  points  with 
known  real-world  positions,  and  given  the  locations 
of  the  projections  of  these  points  onto  the  image 
plane,  i  t  is  possible  to  determine  the  position  and 
orientation  of  the  camera  which  collected  the  image, 
a  process  known  to  traditional  pbotogr ^mmetris ts  as 
space  resection  [T].  Conversely,  given  the  posi¬ 
tions  and  orientations  of  two  cameras  and  the 
locations  of  corresponding  point-pairs  in  the  two 
Image  planes,  the  real-world  locations  of  the 
/iewed  pround  points  can  be  determined,  a  process 
known  as  space  intersection  [T].  Combining  these 
two  techniques  iteratively  produces  the  basis  for 
bootstrap  stereo. 

Figure  2  shows  an  AAV  which  has  obtained 
images  at  three  points  in  its  trajectory.  The 
bootstrap  stereo  process  begins  with  the  set  of 


202 


«  i 


landmark  points,  a  and  b,  whose  real-world 
coordinates  are  known.  (In  reality,  at  least 
four  points  would  be  needed;  only  two  are  showi. 
here  to  simplify  the  diagram).  From  thes  ,  the 
camera  position  and  orientation  is  determined  for 
the  image  frame  taken  at  Time  0.  Standard  image- 
marching  correlation  techniques  [H]  are  then  used 
to  locate  these  same  points  in  the  second,  over¬ 
lapping  frame  taken  at  Time  1.  This  permits  the 
second  camera  posit'  on  and  orientation  to  be 
determined . 

Because  the  aircraft  will  scon  be  out  of 
sight  cf  the  known  landmarks,  new  landmark 
points  must  be  established  whenever  possible. 

For  this  purpose,  "interest ing  points"  --  points 
with  a  high  likelihood  of  being  matched  [M]  -- 
are  selected  in  the  first  image  and  matched  in  the 
second  image.  Successfully  matched  points  have 
their  real-wond  locations  calculated  from  the 
camera  position  and  orientation  data,  then  join 
the  landmarks  list.  In  Figure  2,  landmarks  c  and 
d  are  located  in  this  manner  at  Time  1;  these  new 
points  are  later  used  to  position  the  aircraft  at 
Time  2.  Similarly,  at  Time  2,  new  landmarks  e  and 
f  join  the  list;  old  landmarks  a  and  b,  which  are 
no  longer  in  the  field  of  view,  are  dropped  from 
the  landmarks  list. 

Once  initialized  from  a  set  of  known  landmarks, 
bootstrap  stereo  has  four  components,  which  we  will 
discuss  in  the  following  sections* 

1)  Camera  Calibration  --  determining  the  camera 
position  and  orientation  from  known  ground 
control  points, 

2)  New  Landmark  Selection  --  choosing  potential 
new  ground  control  points, 

3)  Point  Matching  --  pairing  a  point  in  one  image 
with  its  corresponding  point  in  a  second, 
overlapping  image . 

4)  Control  Point  Positioning  --  locating  points 
on  the  ground,  given  their  positions  in  two 
images  and  the  relevant  camera  positions  and 
orientations , 

CAMERA  CALIBRATION 

Given  a  set  of  ground  control  points  with 
known  real-world  positions  (Xi,Yi,Zi),  arc  given 
the  locations  of  the  perceived  locations  or  these 
points  on  the  image  plane  (Ui,V;.),  it  is  possible 
to  determine  the  position  (X0,Y0,ZC)  and  orienta¬ 
tion  (HEADING, PITCH, ROLL)  of  the  camera  which  took 
the  imagery  [d&H],  [T] ,  This  is  accomplished  by  a 
least-squares  solution  of  a  set  of  collinearity 
condition  equations,  effectively  minimizing  the 
mean  of  the  errors  between  each  image  plane  point 
(Ui,Vi)  and  the  projection  (IK', Vi’)  of  that 
point's  real-world  location  (Xi,Yi,Zi)  onto  the 
image  (see  Figure  3),  Because  the  equations  are 
highly  nonlinear,  a  solution  is  usually  sought  by 
iterating  on  a  linearization  cf  the  problem  [G], 

The  solution  is  initialized  from  camera  orienta¬ 
tion  data  provided  by  the  Instruments  Specialist 


and  a  position  estimate  from  the  Dead  Reckoning 
Specialist . 

This  technique  is  somewhat  sensitive  to  in¬ 
valid  points  which  may  appear  in  its  data  set. 
Consequently,  each  camera  solution  should  be 
checked  to  see  if  any  of  the  points  are  contribut¬ 
ing  excessively  to  the  residual  error,  Sucb 
points  should  be  removed  from  the  data  set  and  the 
solution  redone,  to  avoid  irijor  errors.  Indeed, 
this  editing  process  usually  has  to  be  iterated  to 
obtain  maximum  data  relia>  ilit>, 

A  promising  technique  under  development  [  F&BJ 
forms  analytically  exact  camera  position  and 
orientation  models  from  subsets  of  the  point  data, 
then  evaluates  the  points  and  potential  models 
together,  before  refining  the  least-squares  model 
from  the  reliable  poipts,  as  above.  This  tech¬ 
nique  appears  to  be  an  improvement,  both  computa¬ 
tionally  and  in  terms  of  accuracy,  to  the  present 
method,  and  may  well  be  the  method  of  choice  for 
the  eventual  bootstrap  stereo  package.  Since  this 
method  is  still  under  development,  we  are  proceed¬ 
ing  with  the  more  standard  technique  in  our 
feasibility  study. 

NEW  LANDMARK  SELECTION 

Because  the  aircraft  rapidly  moves  beyond  the 
known  landmarks,  new  landmark  points  must  constant¬ 
ly  be  established.  For  this  purpose,  "interesting 
points"  --  points  with  a  high  likelihood  of  being 
matched  [ Mj  --  are  selected  in  the  old  image  of 
each  pair,  then  matched  with  their  corresponding 
points  in  the  new  image  and  located  on  the  terrain. 

Matching  is  done  on  the  basis  of  the  normal¬ 
ized  cross  correlation  between  small  windows  of 
data  (typically  11  x  11)  around  the  two  points  in 
question.  If  the  window  to  be  matched  contains 
little  information,  it  can  correlate  reasonably 
well  with  any  other  area  of  similar  low  information. 
To  avoid  mismatches  from  attempting  to  use  such 
areas,  the  simple  statistical  variance  of  the  image 
intensities  over  Lhe  window 

var  =  MEAN(INT(i, j)  -  MEAN(INT))2 

ij 

was  used  as  an  early  measure  of  information  [Hj, 
with  only  areas  of  high  information  being  accept¬ 
able  candidates  for  matching. 

Matching  also  has  trouble  with  strong  linear 
edges,  since  an  otherwise  featureless  area  con¬ 
taining  a  strong  edge  will  match  equally  well 
anywhere  along  the  edge.  To  reject  such  areas, 
the  notion  of  directed  variance  was  introduced 
[Mj,  Four  quantities  are  calculated  over  the 
window : 

dirvarl  =  MEAN ( INT ( i , j )  -  INT( i  +  1 ,  j ) ) 2 
dirvar2  =  MEAN(XNT(i, j)  -  INT(i,j  +  l))2 
dirva*-3  -  MEAN ( INI (i,j)  -  j:NT(i  +  l,j  +  l))2 
di  var4  =  MEAN(INT(i  +  l,j)  -  INT(i,j  +  l))2 


203 


minimum  of  these^our  quant  it  • defined  t0  be  the 
poor  visual  texture  win  k-  tleS'  Points  with 
iance  because  adjacent  sa  l0W  dire,-ted  var- 
any  of  the  directions ^  7**  little  in 

will  show  low  directed  varian  '"fth  Unear  cd8es 
of  ^e  edge.  Conversely  nn  **  the  Action 
rected  variance  should  Ivoid  "“h  high  di' 
interesting  points"  were  d’finT  defects-  Thus, 

sl:  *“• 

Presence  ora^edge^n  th° Li^ lndicator  of  the 
the  directional  variances  takf°V'  the  ratio  cf 
pairs.  This  measure  takes  a^  ln  perPendicular 
that  a  window  with  a  s f  advantage  of  the  fact 
greater  information  content  86  Wl11  have  much 
ai°ng  it.  Because  this  men*  the  edge  £han 

indication  of  low  informa-  does  not  give  an 

with  ordinary  variance  to  form  T  ^V6  combined  it 
we  cal1  edged  variance.  ”  lnterest  measure 


,v" '  W0,'8SS-  sss.  nag, 

terest  meases 'fo/f^.1.* U°n  °f  those  Hire.  Ik- 
taken  over  the  Night  Vision"^0*  three  tmages 
The  images  in  the  first  terrain  model. 

I"?  »«  Peaks  1„  otd,„arr‘U"  ' h*V*  «»>.  *»■ 
column  shows  peaks  in  Hi  lance,  the  second 
column  shows  peaks  in  edged  ^^"^  the  third 
ordinary  and  directed  vartf  Varfance-  Note  that 
the  strong,  irregular  edgeswhich^  P°intS  alon8 
grass  boundaries,  and  ignore  n,*  H  the  tree/ 
more  subtle  featu-es  fh°  !  P  3reas  havin8 
other  hand,  gives  a  c0mhl8  d.Variance>  °n  the 
subtle  features,  while  ^vo^i  "  °f  StrPng  and 
areas.  For  the  selection  ol  e*cessively  plain 

variance  is  the  interest  mf,  landmarks,  edged 

interest  measure  of  choice. 

POINT  MATCHING 

Pair  SLor^  17^1°/  ^  *  a"  *ma8e 

correlation  cyer^Ll "0rmall*ed  cross 
Points.  Given  an  approx  wf  SUrroundin8  the 
"ent  which  describes  the^  u"  tP  the  disPlace- 
grid  search  is  a  fairly  efficie  t  SlmPle  Spiraling 
the  precise  match  [h]  way  to  refine 

approximation,  we  haw  To,Provide  that  initial 
tion  matching  [hJ  ,  [M] .  P‘oyed  a  form  of  reduc- 

create  a 

square  cf  pixels  L  7^'  EaCh  N  X  N 

a  single  pixel  at  the  next L ls  a-eraged  to  form 
example  shown,  N  =  3)  ]*  (For  the 

repeated  at  each  level  17  feduc“0I>  Process  is 
becomes  approximately  th«  size"8?”11!11  the  Fmage 
windows  being  used.  ’  f  the  correlation 

Matching  then  begins  af  n, 

A  window  centered  on  the  W?  Sralles£  images, 
spiral  search,  beginning  n“ltChed  via  the 

second  image.  Thereafter  f  Center  of  the 
spawns  four  points  arfund ’itseff^f^  P°iat 
a  window  radius  along  the  b8elf»  offset  by  half 
long  the  diagonals  of  the  window.. 


ca rrying 6 1 Ik ir parent 1  s °d is 6  ‘  ^  leVel  of  ^ges 
magnified)  as  their  c,  di  pw  cement  (suitably^ 
These  points  then  have' jJJJf h  apProximation. 
by  a  spiraling  search  before  sol  r"6"'8  refined 
This  process  (illustrated  it  p?  lnf  ne«  points, 
until  the  largest  images  are  r  continues 

setting  up  a  grid  of  control  points ^  affectiv*iy 

puincs  tor  matching. 

laving  this  initialization 
further  matches  be  approximated ' ’  r™  lntended  that 
ment  of  the  nearest  g  i  “  the  disP£ace- 

re fined  via  the  spiral  se^h  „P°lnt>  the" 
however,  that  an  occasion^' h#,  We  disc°vered, 

Che  process  of  carrying  the  P°f"k  C°Uld  bo  lost  in 
hierarchy,  either  because  ?  dow"  the  image 

over  the  edge  of  the  imaee  h  match  disaPpeared 
motion  in  an  area,  or  beclise  wV*  1<5W  lnf°r- 
tortion  caused  a  match  to  b.  relicf- induced  dis- 
determined  bv  autoron  /  unreliable  (as 
Consequently,  using  the1*?10"  thresholding  [H  ). 
searching  through  the  grid°SeSt  point  required 

displacement  In  a  different5  C°  appr0Itimdie  the 
usually  does  not  present  any1^’  Aerial  imagery 
ment,  so  it  is  reasonable  t  reversflls  in  displace- 
and  DY  components  of  the  dis  aPproxi',laCe  the  DX 
polynomials  in  X  and  Y  to  by  fittinS 

further  point  to  be  matched  r  [QJ  ‘  For  each 
first  image  is  used  to  eJaluate^th0^10"  ln  the 
ials,  producing  an  estima  »  «  Lhe  two  Polynom- 

matching  second"  "  P0Siti0n  °f  Cba 

krhen  we  used  fi^q^ 

approximation,  the  residua polynomials  for  this 
order  of  a  pixel,  and  reliahl  W6re  °n  the 

been  initialized  from  these  ooi"**!!*8  Which  had 
by  as  much  as  3  pixels  I  P°lynon,ials  differed 

mat-  Us^S  second -order  ooW6  Pfedicted  disPlace- 

residual  errors  on  the  ord  T  als  reSulted  in 
reliable  matches  differed  by  3  pixel>  and 

from  the  predicted  displaced  1!SS  th3n  2  Pixels 
deemed  adequate  for  initializ?  klnCe  this  was 
search,  higher-order  oolvnn  ?8  Cbe  ^oca^  match 
r  Polynomials  were  not  tried. 

reduction  maLhinjTLLL°d  ^  booCst:raP  stereo, 
mate  registration  of  the  ima°  determine  approxi- 
the  second-order  match  n  i?f8es  and  to  initialize 
Matching  of  old  landmarksrefndto°n  P°lyi,0mials' 

PC’nts  to  create  „ d  or  lnteresting 
nomials  to  predict  "  landraarks  “ses  these  poly- 
then  refined  by  a  localT^*1!^6  matCh»  which  is 

thresholding  is  used  tn  8earch.  Autocorrelation 

tba  match,  then  poLts  arfV^  areUabiUty 
than  the  image  grid  r>p ™i .  cated  more  closely 
Portion  of  fhe8X-  and  ®  s3  ^  p3rabolic  inter¬ 
values.  X  3nd  ^-slices  of  the  correlation 


CONTROL  POINT  POSITIONING 


camerL^and'the^oca'Sons^f  °^ntati°ns  of  two 
Pairs  in  the  two  image  planes' ^°"espcndin8  Point- 

pp.pk.  «t  J-J - X“£. 


204 


image  plane  points  are  simply  projected  into  jpaco 
(see  Figure  6).  Since  these  rays  rarely  inter¬ 
sect  exactly,  we  find  their  points  of  closest 
approach  and  average  them. 

If  the  difference  is  large  or  the  real-world 
point  is  unreasonably  different  from  its  neighbors, 
the  point  is  rejected  as  having  resulted  from  a 
bad  match.  Otherwise,  this  point  joins  the  list 
of  control  points  for  future  matching  and  camera 
calibration . 

AN  EXAMPLE 

In  figure  7,  we  present  an  example  of  the 
control-point  handling  portion  of  bootstrap  stereo. 
The  origina1  data  set,  a  sequence  of  3  images  from 
a  Night  Vision  Laboratory  tape,  is  shown  in 
Figure  7a. 

Figure  7b  shows  the  interesting  points  in  the 
first  image,  indicated  by  +  overlays.  If  these 
were  the  control  points  from  a  landmark  processor, 
we  would  use  them  to  locate  the  first  camera. 

These  landmark  points  are  then  matched  with  their 
corresponding  points  in  the  second  image;  Figure 
7c  shows  the  successful  matches  overlaid  on  the 
first  and  second  images.  From  the  image  plane 
positions  of  these  points,  the  position  and  orien¬ 
tation  of  the  second  camera  are  determined. 

Next,  the  areas  of  tb?  second  image  which 
were  covered  by  matches  are  blocked  out  and 
interesting  points  are  found  in  the  uncovered 
areas,  as  seen  in  Figure  7d.  The  old  landmark 
points  and  the  interesting  points  are  then  matched 
in  the  third  image,  as  shown  in  Figure  7e.  The 
old  control  points  from  the  second  image  are  used 
to  calibrate  the  third  camera;  the  camera  cali¬ 
brations  are  then  used  to  locate  the  matched 
interesting  points  on  the  ground,  forming  new 
control  points. 

These  last  two  steps  are  repeated  for  subse¬ 
quent  pairs  of  images  in  longer  sequences. 

\niflERABILITY 

The  bootstrapping  process  is  far  from  infall¬ 
ible.  The  errors  to  which  it  is  vulnerable  fall 
into  roughly  four  categories: 

1)  Loss  of  overlapped  imagery. 

2)  Errors  in  matching  control  points. 

3)  Errors  in  camera  calibration. 

4)  Errors  in  control-point  positioning. 

The  bootstrapping  process  could  lose  it^ 
needed  overlap  in  imagery  if  the  terrain  over 
which  the  AAV  is  flying  is  obscured  by  clouds. 

If  the  terrain  is  essentially  featureless  (for 
example,  a  large  body  of  water  or  a  desert),  then 
the  bootstrapper  will  be  unable  to  find  and  match 
sufficient  control  points  to  continue.  Similarly, 
several  dropped  frames  resulting  from  a  temporary’ 
equipment  failure  could  cause  the  bootstrapping 
process  to  abort . 


A  human  navigator  faced  with  clouds  or 
featureless  terrain  would  simply  shift  mental 
gears  and  fly  on  instruments  and  dead  reckoning 
until  more  favorable  terrain  was  found.  He  would 
then  attempt  to  locate  new  landmarks  from  which 
to  re~orient  himself.  Mimicking  this,  when  stereo 
bootstrapping  loses  its  overlapped  imagery,  the 
Stereo  Specialist  just  reports  failure  to  the 
Navigation  Expert,  which  then  proceeds  to  rely  on 
its  Dead  Reckoning  Specialist  until  its  Landmarks 
Specialist  can  recognize  some  new  landmarks  from 
which  to  re-orient  the  Stereo  Specialist  for 
bootstrapping.  Until  then,  the  Stereo  Specialist 
processes  any  available  imagery  to  extract  ground 
velocity  for  the  Dead  Reckoning  Specialist. 

The  other  three  problems  rarely  cause  the 
bootstrapping  process  to  fail  completely. 

Instead,  they  interact  to  create  errors  in  the 
vehicle  positions  determined  by  bootstrapping. 

Since  these  are  somewhat  inevitable,  it  is  anti¬ 
cipated  that  the  Landmarks  Specialist  will  be 
invoked  periodically  to  search  for  checkpoint 
landmarks.  Course  corrections  will  then  be  deter¬ 
mined  from  these  checkpoints,  and  the  bootstrapper 
will  be  re-initialized. 

Gross  errors  in  match,  which  can  occur  due 
to  repetitive  textures  or  moving  objects,  are 
likely  to  be  caught  by  the  autocorrelation  thresh¬ 
olding  or  by  the  depth  consistency  or  camera  model 
consistency  requirements.  Small  errors  in  match, 
such  as  would  result  from  improper  sub-pixel  reg¬ 
istration  of  the  data,  can  slip  through,  these 
will  bias  the  camera  calibrations  and  control 
point  locations  slightly. 

Errors  in  camera  calibrations  result  either 
from  errors  in  the  data  or  from  insufficient 
precision  in  the  calibration  calculations.  The 
latter  can  be  designed  out  cf  the  system  by 
insuring  that  the  processor  has  sufficient  word- 
length  and/or  floating-point  precision  to  handle 
matrix  inversion.  Errors  in  the  data  are  most 
likely  to  affect  the  camera  position,  as  its 
orientation  is  fairly  well  known  from  the  vehicle 
orientation  reported  by  the  Instrument  Specialist. 
Techniques  exist  [t]  for  the  adjustment  of  the 
image  data  points  along  with  the  camera  parameters, 
to  produce  more  consistent  results. 

Errors  in  the  positions  of  the  original  land¬ 
marks  will  be  propagated  through  the  bootstrapping 
chain.  Such  errors  should  be  static,  however, 
and  should  only  result  in  a  small  perturbation 
in  the  vehicle  location.  Errors  in  control  point 
positioning  are  interrelated  with  errors  in  the 
match  point  location  and  the  camera  calibration. 
Using  the  redundancy  inherent  in  multiple  images 
[_MJ  can  resolve  some  of  these  uncertainties. 

The  manner  in  which  these  errors  accumulate 
is  complex  and  not  readily  amenable  to  analytic 
examination.  One  of  our  tasks  in  the  coming 
months  will  be  to  examine  these  errors  in  simula¬ 
tion.  We  plan  to  generate  sets  of  known,  ground 
points  and  known  camera  locations  and  orientations. 
For  each  camera  position,  the  visible  ground  points 
will  be  mathematically  projected  into  tne 


205 


simulated  focal  plane,  and  the  camera  localization 
and  control  point  positioning  portions  of  the 
bootstrapping  process  will  be  run.  It  will  be 
possible  to  perturb  the  data  at  each  step  in  the 
process,  so  we  can  analyze  the  effects  of  various 
errors  on  the  bootstrap  procedure  by  comparing 
the  calculated  camera  and  checkpoint  positions  to 
the  true  positions  which  generated  the  original 
data . 

CONCLUSIONS 

When  an  autonomous  aerial  vehicle  must 
navigate  without  using  external  signals  or  rad¬ 
iated  energy,  a  visual  navigator  is  an  enticing 
possibility.  We  have  proposed  a  Navigation 
Expert  capable  of  emulating  the  behavior  of  an 
early  barnstorming  pilot  in  using  terrain  imagery. 
One  tool  such  a  Navigation  Expert  could  use  is 
bootstrap  stereo.  This  is  a  technique  by  which 
the  vehicle's  position  is  determined  from  the 
perceived  positions  of  known  landmarks,  then  two 
known  camera  positions  are  used  to  locate  real- 
world  points  which  serve  as  new  landmarks. 

The  components  of  bootstrap  stereo  are  well 
established  in  the  photogramme  try  and  image  pro¬ 
cessing  literature.  We  have  combined  these,  with 
improvements,  into  a  workable  system.  We  are 
continuing  to  work  on  the  error  analysis,  to 
determine  how  the  errors  propagate  and  accumulate. 


REFERENCES 

[D&H]  Duda,  R.  0.  and  P.  E.  Hart.  Pattern 

Classification  and  Scene  Analysis,  John 
Wiley  and  Sons,  New  York,  New  York,  1973. 

[F&B]  Fischler ,  M.  A.  and  R.  C.  Bolles,  "Ran¬ 
dom  Sampling  Consensus",  Proceedings* 

Image  Understanding  Workshop,  College 
Park,  Maryland,  April  30,  1980. 

[G]  Gennery,  D.  G.,  "A  Stereo  Vision  System 
for  an  Autonomous  Vehicle",  Proceedings 
of  the  5th  1JCAI,  Cambridge,  Massachu¬ 
setts,  1977. 

[H]  Hannah,  M.  J.,  Computer  Matching  of  Areas 
in  Stereo  Imagery,  Fh.D.  Thesis,  AIM  #239, 
Computer  Science  Department,  Stanford 
University,  California,  1974. 

[ M]  Moravec,  H.  P.,  "Visual  Mapping  by  a  Robot 
Rover",  Proceedings  of  the  6th  IJCAI,  Tokyo 
japan,  1979. 

[Qj  Quam,  L.  H.  ,  Computer  Comparison  of  Picture 
Ph.D.  Thesis,  AIM# 144,  Computer  Science 
Department,  Stanford  University,  California 
1971. 

[T]  Thompson,  M,  M.  ,  Manual  of  Photogramme  try , 
American  Society  of  Photogramme  try ,  Falls 
Church,  Virginia,  1944. 


•  RELATIVE  ALTITUDE 

•  VELOCITY 

•  TERRAIN  MODELING 


Figure  1  Components  of  the  Navigation  Expert. 


.*•  i 


If* 


A  hierarchy  of  images,  each 
•he  3  y  J  reduction  of  its 
parent . 


Matching  points  found  by 
expanding  a  grid  through 
reduction  image  hierarchy 


Camera  ?  position 


CAMERA  '  POSITION 


"u-.nvi;  HJ (Mi 


CAMERA  2  ORIENTATION 


CAMERA  1  ORIENTATION 


'V\T  '•  -iN? 

IX,  tr  zy 


POINTS  OF  CLOSEST  APPROACH 


Figure  6  Point  Position  Calculation 


The  points  (S,T)  end  (U,V)  are  projected  through  tneir  respective 
cameras.  Their  irtersection  (X,Y,Z)  is  defined  to  be  the  mid¬ 
point  between  the  points  of  closest  approach  for  these  rays. 


' :  ^  . 

#  *  1  i  ■ " 

*  ■  f 

\  i 

v  •  '  . 

v  * 

* 

M 

- 

t  * 

V  1  *  \ 

)  * 

-  y 

...  * 

<  V..  .  .  . 

.  -  .  '1  * 

1 

* 

4 

m 

KvSsv 

BAv 

^  r  ■ 

*.*  ’  -■;*  * 

r/  “  - 

*'  f/  f  .  * 

‘9«  a  jjr  ■ . 

r>  *7*n  •* 

-  *  *»*  ' 

J  \  \ 

41 

n  •  *  *«•  ** 

A. 


209 


STATUS  REPORT 
ON  THE 

ADVANCED  FLEXIBLE  PROCESSOR 
BY 

G.R.  Allen 


Information  Sciences  Division 
Control  Data  Corporation 


A  major  milestone  was  reached  in  earlv 
February  in  the  development  of  the 
Advanced  Flexible  Processor  (AFP)  .  Con¬ 
struction  of  the  first  AFP  was  completed 
and  checkout  was  initiated  on  the  AFP 
and  interface  to  the  system  controller 
(POP  11/70).  All  of  the  people  who 
labored  for  several  years  to  design, 
simulate,  and  fabricate  the  AFP  were 
extremely  pleased  to  havi  reached  this 
milestone.  An  appropriate  celebration 
was  held  to  mark  the  occasion.  More 
good  news  -  testing  and  debugging  the 
machine  has  gone  extremely  well. 


The  design  has  now  seen  fully 
verified.  No  major  design  errors  have 
been  uncovered  in  the  checkout  process 
(6-7  days  per  week,  3  shifts  per  day) 
as  o^  April  2,  1980.  All  available 
diagnostics  (over  2000  lines  of  code 
have  been  run  successfully  with  the 
hardware  passing  old  tests.  The  first 
set  of  user  application  code  has  been 
run  and  is  now  nearly  operational. 


Construction  of  the  second  unit 
(to  be  delivered  to  Carnegie-Mellon 
University)  is  proceeding  in  paralle1 
w  th  the  checkout  effort;  however, 
this  is  taking  second  place  to 
getting  the  first  machine  working. 

The  cabinet  construction  is  complete 
and  the  cooling  system  is  starting 
into  final  test.  With  few  exceptions, 
the  printed  circuit  boards  and 
integrated  circuits  for  the  CMU 
machine  are  on  hand.  Folic  wing  com¬ 
pletion  of  major  testing  on  the  first 
AFp,  changes  will  be  installed  in 
the  CMU  unit  and  checkout  will  be 
initiated. 


Work  has  also  been  started  on 
a  nine-processor  system  to  be  completed 
in  mid-1981.  This  system  will  provide 
a  computer  capability  of  over  2 
billion  arithmetic  operations  per 
second. 


Progress  in  bringing  up  the  AFP 
has  been  phenomenal  and  can  be  attributed 
to  three  factors.  The  machine  was 
fully  simulated  at  the  gate  level 
including  gate  and  wire  delays.  Proven 
technology  was  used  in  the  construction 
i  icludxng  power  distribution,  freon 
cooling,  printed  circuit  board  con¬ 
struction,  and  integrated  circuits. 

An  instruction-level  simulator  was 
developed  to  model  the  hardware  which 
eliminated  the  problem  of  debugging 
untested  programs  on  untested  hardware. 


